Question on serving quantized version in VLLM
#39
by
x5fu
- opened
I'm getting the errorValueError: np.uint32(39) is not a valid GGMLQuantizationType
when trying to serve the quantized version with vllm v0.11.1.
However, serving orignal gpt-oss-20b from huggingface with vllm is all fine so I'm assuming that MXFP4 is well supported with vllm already. Does anyone have any clue on this?
BTW merry christmas and happy new year!