MikeRoz commited on
Commit
f411ee8
·
verified ·
1 Parent(s): 9856166

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -9
README.md CHANGED
@@ -12,15 +12,17 @@ tags:
12
 
13
  [exllamav3](https://github.com/turboderp-org/exllamav3/) quantizations of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5). Quantized using commit 89b841d of the dev branch.
14
 
15
- | Quant | Size | KLD | PPL |
16
- | --- | --- | --- | --- |
17
- | [2.00 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/2.00bpw_H6) | 61.054 GiB | 0.42365 | 9.31452 |
18
- | [2.10 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/2.10bpw_H6) (optimized) | 57.292 GiB | 0.36355 | 9.20850 |
19
- | [2.50 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/2.50bpw_H6) (optimized) | 67.838 GiB | 0.30152 | 8.88802 |
20
- | [3.00 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/3.00bpw_H6) | 81.613 GiB | 0.17263 | 8.58626 |
21
- | [3.06 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/3.06bpw_H6) (optimized) | 82.656 GiB | 0.15648 | 8.66856 |
22
- | [4.00 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/4.00bpw_H6) | 108.087 GiB | 0.07882 | 8.45404 |
23
- | [5.00 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/5.00bpw_H6) | 134.561 GiB | - | - |
 
 
24
 
25
  ### K/L-D and PPL graphs
26
 
 
12
 
13
  [exllamav3](https://github.com/turboderp-org/exllamav3/) quantizations of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5). Quantized using commit 89b841d of the dev branch.
14
 
15
+ Note that tensor parallelism is not currently supported for this architecture, so multi-GPU setups will have a harder time fitting this model than they would otherwise (you'll get more context out of 1 96 GB GPU than 4x24 GB GPUs).
16
+
17
+ | Quant | Size | KLD | PPL | GPU Requirement Hint |
18
+ | --- | --- | --- | --- | --- |
19
+ | [2.00 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/2.00bpw_H6) | 61.054 GiB | 0.42365 | 9.31452 | 3x24 GB w/ 49152 FP16 context |
20
+ | [2.10 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/2.10bpw_H6) (optimized) | 57.292 GiB | 0.36355 | 9.20850 | 3x24GB w/ 40960 FP16 context |
21
+ | [2.50 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/2.50bpw_H6) (optimized) | 67.838 GiB | 0.30152 | 8.88802 | 4x24GB w/ 90112 FP16 context |
22
+ | [3.00 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/3.00bpw_H6) | 81.613 GiB | 0.17263 | 8.58626 | 4x24GB w/ 16384 FP16 context |
23
+ | [3.06 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/3.06bpw_H6) (optimized) | 82.656 GiB | 0.15648 | 8.66856 | 4x24GB w/ 12288 FP16 context |
24
+ | [4.00 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/4.00bpw_H6) | 108.087 GiB | 0.07882 | 8.45404 | 6x24GB w/ 49152 FP16 context |
25
+ | [5.00 bpw h6](https://huggingface.co/MikeRoz/MiniMax-M2.5-exl3/tree/5.00bpw_H6) | 134.561 GiB | - | - | 5x24GB + 1x32GB w/ 24576 FP16 context (will not load for me with 6x24GB) |
26
 
27
  ### K/L-D and PPL graphs
28