Question on serving quantized version in VLLM

#39
by x5fu - opened

I'm getting the error
ValueError: np.uint32(39) is not a valid GGMLQuantizationType
when trying to serve the quantized version with vllm v0.11.1.

However, serving orignal gpt-oss-20b from huggingface with vllm is all fine so I'm assuming that MXFP4 is well supported with vllm already. Does anyone have any clue on this?

BTW merry christmas and happy new year!

Sign up or log in to comment