KLD Measurement Methodology?
Just curious, how are you measuring KLD? I pulled down your 3.0bpw and ran it through eval/compare_q.py, and got 0.17608872470218068 - slightly higher than what I got for my 3bpw quant, and much higher than either of the numbers in your table.
One thing I noticed is that we're using different evaluation scripts - you used eval/compare_q.py, while I use eval/model_diff.py:
eval/model_diff.py -ma "$QUANT_PATH" -mb "$ORIG_PATH" -r 100
I'm not sure if that fully explains the difference, but it could be a factor worth looking into.
Another thing to keep in mind is that, if I'm not missing anything (I can't recheck right now), the evaluation scoring appears to be non-deterministic in recent versions of exllama - the numbers differ slightly across repeated runs. At least, I observed this behavior while quantizing Qwen3-Coder-Next on exllama3 v0.0.20, though it's possible that was a bug related to that specific model's inference process (context).
For MiniMax2.5, I used commit 701afb9 to perform the quantization and process metrics. If you're interested, I've just uploaded the per-layer evaluation logs (standard output of model_diff.py) so you can compare the numbers with what you got.
Let me know if you need any other information.
P.S. Thanks for publishing exl3 quants! I've used your quants a lot - it's just that in recent months I've wanted to figure all of this out on my own.