MMLU PRO Benchmark
#3
by
sevapru
- opened
Hi, ran your model for a MMLU PRO benchmark this night (On Jetson Thor in 120W mode)
MMLU-Pro Benchmark
Model: GadflyII/Qwen3-Coder-Next-NVFP4
Endpoint: http://localhost:8868/v1
Subjects: 14
Total questions: 12032
Workers: 16
Max tokens: 4096
Subject Correct Wrong Accuracy
..
biology 646 71 90.1%
business 654 135 82.9%
chemistry 958 174 84.6%
computer science 345 65 84.1%
economics 709 135 84.0%
engineering 629 340 64.9%
health 621 197 75.9%
history 261 120 68.5%
law 639 462 58.0%
math 1237 114 91.6%
other 701 223 75.9%
philosophy 368 131 73.7%
physics 1125 174 86.6%
psychology 644 154 80.7%
...
TOTAL 9537 2495 79.26%
Write me if you want to add the information to the model card
Thanks! I will add it to the card!
Did you also test the BF16 as a baseline?
No, I have only (lol) 128GB of VRAM, wouldn't fit without swap memory -> will be too painful to run whole bench without concurency and enough caching space