How to quantize VL models to NVFP4
#1
by
UnicornChan
- opened
I tried to quantize other VL models following the hf_ptq.py in modelopt, and although the quantization was successful, the model doesn't produce coherent output after running—only the official Qwen2.5-VL model provided works normally. I compared the config.json and model.safetensors.index.json files but found no differences. I want to know why this is happening?"
Hi, @UnicornChan , would you mind raising this request here? https://github.com/NVIDIA/TensorRT-Model-Optimizer/issues