KLD Measurement Methodology?

by MikeRoz - opened 3 days ago

3 days ago

•

Just curious, how are you measuring KLD? I pulled down your 3.0bpw and ran it through eval/compare_q.py, and got 0.17608872470218068 - slightly higher than what I got for my 3bpw quant, and much higher than either of the numbers in your table.

NeuroSenko

Owner 2 days ago

•

edited 2 days ago

One thing I noticed is that we're using different evaluation scripts - you used eval/compare_q.py, while I use eval/model_diff.py:

eval/model_diff.py -ma "$QUANT_PATH" -mb "$ORIG_PATH" -r 100

I'm not sure if that fully explains the difference, but it could be a factor worth looking into.

Another thing to keep in mind is that, if I'm not missing anything (I can't recheck right now), the evaluation scoring appears to be non-deterministic in recent versions of exllama - the numbers differ slightly across repeated runs. At least, I observed this behavior while quantizing Qwen3-Coder-Next on exllama3 v0.0.20, though it's possible that was a bug related to that specific model's inference process (context).

For MiniMax2.5, I used commit 701afb9 to perform the quantization and process metrics. If you're interested, I've just uploaded the per-layer evaluation logs (standard output of model_diff.py) so you can compare the numbers with what you got.

Let me know if you need any other information.

P.S. Thanks for publishing exl3 quants! I've used your quants a lot - it's just that in recent months I've wanted to figure all of this out on my own.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment