Update README.md
Browse files
README.md
CHANGED
|
@@ -12,15 +12,17 @@ tags:
|
|
| 12 |
|
| 13 |
[exllamav3](https://github.com/turboderp-org/exllamav3/) quantizations of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5). Quantized using commit 89b841d of the dev branch.
|
| 14 |
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
|
| 18 |
-
|
|
| 19 |
-
| [2.
|
| 20 |
-
| [
|
| 21 |
-
| [
|
| 22 |
-
| [
|
| 23 |
-
| [
|
|
|
|
|
|
|
| 24 |
|
| 25 |
### K/L-D and PPL graphs
|
| 26 |
|
|
|
|
| 12 |
|
| 13 |
[exllamav3](https://github.com/turboderp-org/exllamav3/) quantizations of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5). Quantized using commit 89b841d of the dev branch.
|
| 14 |
|
| 15 |
+
Note that tensor parallelism is not currently supported for this architecture, so multi-GPU setups will have a harder time fitting this model than they would otherwise (you'll get more context out of 1 96 GB GPU than 4x24 GB GPUs).
|
| 16 |
+
|
| 17 |
+
| Quant | Size | KLD | PPL | GPU Requirement Hint |
|
| 18 |
+
| --- | --- | --- | --- | --- |
|
| 19 |
+
| [2.00 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/2.00bpw_H6) | 61.054 GiB | 0.42365 | 9.31452 | 3x24 GB w/ 49152 FP16 context |
|
| 20 |
+
| [2.10 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/2.10bpw_H6) (optimized) | 57.292 GiB | 0.36355 | 9.20850 | 3x24GB w/ 40960 FP16 context |
|
| 21 |
+
| [2.50 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/2.50bpw_H6) (optimized) | 67.838 GiB | 0.30152 | 8.88802 | 4x24GB w/ 90112 FP16 context |
|
| 22 |
+
| [3.00 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/3.00bpw_H6) | 81.613 GiB | 0.17263 | 8.58626 | 4x24GB w/ 16384 FP16 context |
|
| 23 |
+
| [3.06 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/3.06bpw_H6) (optimized) | 82.656 GiB | 0.15648 | 8.66856 | 4x24GB w/ 12288 FP16 context |
|
| 24 |
+
| [4.00 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/4.00bpw_H6) | 108.087 GiB | 0.07882 | 8.45404 | 6x24GB w/ 49152 FP16 context |
|
| 25 |
+
| [5.00 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/5.00bpw_H6) | 134.561 GiB | - | - | 5x24GB + 1x32GB w/ 24576 FP16 context (will not load for me with 6x24GB) |
|
| 26 |
|
| 27 |
### K/L-D and PPL graphs
|
| 28 |
|