Inconsistent results when using fp8?

#22

by songwang41 - opened 12 days ago

12 days ago

•

vllm serve /share5/projects/llm/models/weight/Kimi-K2-Instruct-0905 \
  --distributed-executor-backend ray \
  --tensor-parallel-size 16 \
  --host 0.0.0.0 --port 8080 \
  --served-model-name kimi-k2-instruct-0905 \
  --trust-remote-code \
  --max-model-len 131072 \
  --max-num-seqs 4 \
  --gpu-memory-utilization 0.95 \
  --quantization fp8 \
  --kv-cache-dtype fp8 \
  --calculate-kv-scales \
  --enable-auto-tool-choice \
  --tool-call-parser kimi_k2

I served kimi-k2-instruct-0905 on 16 h100 gpus. when I inference with the endpoint, I got some inconsitent reuslts. Any clues? Is my hosting the model correct.

The original prompt (11,701 tokens) consistently fails with kimi-k2:
5/5 attempts returned empty response
stop_reason: 163586 (appears to be an internal error code)
completion_tokens: 1 (only generates 1 token before stopping)
Comparison:
Prompt Type Tokens kimi-k2 Claude GPT-5
Simple (same question) 136 ✅ Works ✅ Works ✅ Works
Original complex 11,701 ❌ Empty ✅ Works ✅ Works

Investigation with prompt length:

======================================================================
FINDING KIMI-K2 TOKEN THRESHOLD (8K-15K Range)

======================================================================
BINARY SEARCH FOR KIMI-K2 TOKEN THRESHOLD

songwang41 changed discussion title from Inconsistent results to Inconsistent results when using fp8? 12 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Inconsistent results when using fp8?

======================================================================FINDING KIMI-K2 TOKEN THRESHOLD (8K-15K Range)

======================================================================BINARY SEARCH FOR KIMI-K2 TOKEN THRESHOLD

======================================================================
FINDING KIMI-K2 TOKEN THRESHOLD (8K-15K Range)

======================================================================
BINARY SEARCH FOR KIMI-K2 TOKEN THRESHOLD