LightOnOCR-2-1B-NVFP4

This is a NVFP4 variant of lightonai/LightOnOCR-2-1B, quantized with NVIDIA Model Optimizer PTQ for vLLM (modelopt_fp4) compatibility.

Upstream Model

Quantization Notes

  • Quantization format: NVFP4
  • Runtime target: vLLM with quantization="modelopt_fp4"
  • Tooling: NVIDIA Model Optimizer
  • Selective policy used for current vLLM compatibility:
    • Quantized: language model and vision projection linear layers
    • Kept BF16: vision encoder and vision patch merger

Usage (vLLM)

vllm serve switzerchees/LightOnOCR-2-1B-NVFP4 \
  --quantization modelopt_fp4 \
  --limit-mm-per-prompt '{"image": 1}' \
  --mm-processor-cache-gb 0 \
  --no-enable-prefix-caching

License

This derivative keeps the same license as the upstream model: Apache-2.0. See the original model card for upstream attribution and details: lightonai/LightOnOCR-2-1B

Downloads last month
53
Safetensors
Model size
0.8B params
Tensor type
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for switzerchees/LightOnOCR-2-1B-NVFP4

Quantized
(9)
this model

Paper for switzerchees/LightOnOCR-2-1B-NVFP4