ubergarm commited on
Commit
2f8b1db
Β·
1 Parent(s): d2918f9

uploading IQ2_KS

Browse files
Files changed (2) hide show
  1. README.md +35 -4
  2. images/perplexity.png +2 -2
README.md CHANGED
@@ -173,16 +173,47 @@ numactl -N ${SOCKET} -m ${SOCKET} \
173
 
174
  </details>
175
 
176
- ## IQ2_KS TODO
177
- TODO
178
 
179
- I'm hoping this IQ2_KS/IQ3_KS mix will run on 96GB VRAM full offload with enough context for agentic use.
180
  <details>
181
 
182
  <summary>πŸ‘ˆ Secret Recipe</summary>
183
 
184
  ```bash
185
- echo TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
  ```
187
 
188
  </details>
 
173
 
174
  </details>
175
 
176
+ ## IQ2_KS 69.800 GiB (2.622 BPW)
177
+ PPL over 552 chunks for n_ctx=512 = 9.6827 +/- 0.07972
178
 
 
179
  <details>
180
 
181
  <summary>πŸ‘ˆ Secret Recipe</summary>
182
 
183
  ```bash
184
+ #!/usr/bin/env bash
185
+
186
+ custom="
187
+ # 61 Repeating Layers [0-61]
188
+
189
+ # Attention [0-61] GPU
190
+ blk\..*\.attn_q.*=q8_0
191
+ blk\..*\.attn_k.*=q8_0
192
+ blk\..*\.attn_v.*=q8_0
193
+ blk\..*\.attn_output.*=q8_0
194
+
195
+ # Routed Experts Layers [0-61] CPU
196
+ blk\..*\.ffn_down_exps\.weight=iq3_ks
197
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq2_ks
198
+
199
+ # Non-Repeating Layers
200
+ token_embd\.weight=iq4_k
201
+ output\.weight=iq6_k
202
+ "
203
+
204
+ custom=$(
205
+ echo "$custom" | grep -v '^#' | \
206
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
207
+ )
208
+
209
+ numactl -N ${SOCKET} -m ${SOCKET} \
210
+ ./build/bin/llama-quantize \
211
+ --custom-q "$custom" \
212
+ --imatrix /mnt/data/models/ubergarm/MiniMax-M2.5-GGUF/imatrix-MiniMax-M2.5-BF16.dat \
213
+ /mnt/data/models/ubergarm/MiniMax-M2.5-GGUF/MiniMax-M2.5-256x4.9B-BF16-00001-of-00010.gguf \
214
+ /mnt/data/models/ubergarm/MiniMax-M2.5-GGUF/MiniMax-M2.5-IQ2_KS.gguf \
215
+ IQ2_KS \
216
+ 128
217
  ```
218
 
219
  </details>
images/perplexity.png CHANGED

Git LFS Details

  • SHA256: 3f4719962637c5d0579073846765d2b661267dada51cdfef56a05adff044359c
  • Pointer size: 131 Bytes
  • Size of remote file: 157 kB

Git LFS Details

  • SHA256: 5f1ef19c1405b576494d9377fa8ef9f9ebd98afc067a389d59e4dc534f4dde74
  • Pointer size: 131 Bytes
  • Size of remote file: 179 kB