Question on serving quantized version in VLLM

#39

by x5fu - opened 11 days ago

x5fu

11 days ago

•

I'm getting the error
ValueError: np.uint32(39) is not a valid GGMLQuantizationType
when trying to serve the quantized version with vllm v0.11.1.

However, serving orignal gpt-oss-20b from huggingface with vllm is all fine so I'm assuming that MXFP4 is well supported with vllm already. Does anyone have any clue on this?

BTW merry christmas and happy new year!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment