can you share ppl graph for your quants ?

by ox-ox - opened 3 days ago

Discussion

ox-ox

3 days ago

tysm

danielhanchen

Unsloth AI org 2 days ago

Perplexity is not a good measure for accuracy for quantization and actually one of the worst ways to measure accuracy recovery.
The only reason why it's so popular is because it's the easiest to replicate.
See: https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs#calibration-dataset-overfitting

"Most frameworks report perplexity and KL Divergence using a test set of Wikipedia articles. However, we noticed using the calibration dataset which is also Wikipedia related causes quants to overfit, and attain lower perplexity scores. We utilize Calibration_v3 and Calibration_v5 datasets for fair testing which includes some wikitext data amongst other data. Also instruct models have unique chat templates, and using text only calibration datasets is not effective for instruct models (base models yes). In fact most imatrix GGUFs are typically calibrated with these issues. As a result, they naturally perform better on KL Divergence benchmarks that also use Wikipedia data, since the model is essentially optimized for that domain."

ru5h

1 day ago

•

edited 1 day ago

Engineers love numbers. We need a way to measure how much we lose by using a smaller quant.

BitBuilder

about 24 hours ago

@danielhanchen - Please release the K-V divergence for each quant level vs baseline unquantized level? You guys point to the MSFT research paper on KV-divergence but you should back it up by having KLD plots for every release you do.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment