MMLU PRO Benchmark

by sevapru - opened 11 days ago

11 days ago

•

Hi, ran your model for a MMLU PRO benchmark this night (On Jetson Thor in 120W mode)

  MMLU-Pro Benchmark
  Model: GadflyII/Qwen3-Coder-Next-NVFP4
  Endpoint: http://localhost:8868/v1
  Subjects: 14
  Total questions: 12032
  Workers: 16
  Max tokens: 4096
  Subject                    Correct    Wrong   Accuracy
..
  biology                        646       71     90.1%
  business                       654      135     82.9%
  chemistry                      958      174     84.6%
  computer science               345       65     84.1%
  economics                      709      135     84.0%
  engineering                    629      340     64.9%
  health                         621      197     75.9%
  history                        261      120     68.5%
  law                            639      462     58.0%
  math                          1237      114     91.6%
  other                          701      223     75.9%
  philosophy                     368      131     73.7%
  physics                       1125      174     86.6%
  psychology                     644      154     80.7%
...
  TOTAL                         9537     2495    79.26%

Write me if you want to add the information to the model card

GadflyII

Owner 11 days ago

Thanks! I will add it to the card!

Did you also test the BF16 as a baseline?

sevapru

11 days ago

No, I have only (lol) 128GB of VRAM, wouldn't fit without swap memory -> will be too painful to run whole bench without concurency and enough caching space

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment