can you share ppl graph for your quants ?
tysm
Perplexity is not a good measure for accuracy for quantization and actually one of the worst ways to measure accuracy recovery.
The only reason why it's so popular is because it's the easiest to replicate.
See: https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs#calibration-dataset-overfitting
"Most frameworks report perplexity and KL Divergence using a test set of Wikipedia articles. However, we noticed using the calibration dataset which is also Wikipedia related causes quants to overfit, and attain lower perplexity scores. We utilize Calibration_v3 and Calibration_v5 datasets for fair testing which includes some wikitext data amongst other data. Also instruct models have unique chat templates, and using text only calibration datasets is not effective for instruct models (base models yes). In fact most imatrix GGUFs are typically calibrated with these issues. As a result, they naturally perform better on KL Divergence benchmarks that also use Wikipedia data, since the model is essentially optimized for that domain."
Engineers love numbers. We need a way to measure how much we lose by using a smaller quant.
@danielhanchen - Please release the K-V divergence for each quant level vs baseline unquantized level? You guys point to the MSFT research paper on KV-divergence but you should back it up by having KLD plots for every release you do.