How to quantize VL models to NVFP4

by UnicornChan - opened 9 days ago

9 days ago

I tried to quantize other VL models following the hf_ptq.py in modelopt, and although the quantization was successful, the model doesn't produce coherent output after running—only the official Qwen2.5-VL model provided works normally. I compared the config.json and model.safetensors.index.json files but found no differences. I want to know why this is happening?"

chenjiel

NVIDIA org 4 days ago

Hi, @UnicornChan , would you mind raising this request here? https://github.com/NVIDIA/TensorRT-Model-Optimizer/issues

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment