uploading IQ2_KS
Browse files- README.md +35 -4
- images/perplexity.png +2 -2
README.md
CHANGED
|
@@ -173,16 +173,47 @@ numactl -N ${SOCKET} -m ${SOCKET} \
|
|
| 173 |
|
| 174 |
</details>
|
| 175 |
|
| 176 |
-
## IQ2_KS
|
| 177 |
-
|
| 178 |
|
| 179 |
-
I'm hoping this IQ2_KS/IQ3_KS mix will run on 96GB VRAM full offload with enough context for agentic use.
|
| 180 |
<details>
|
| 181 |
|
| 182 |
<summary>π Secret Recipe</summary>
|
| 183 |
|
| 184 |
```bash
|
| 185 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 186 |
```
|
| 187 |
|
| 188 |
</details>
|
|
|
|
| 173 |
|
| 174 |
</details>
|
| 175 |
|
| 176 |
+
## IQ2_KS 69.800 GiB (2.622 BPW)
|
| 177 |
+
PPL over 552 chunks for n_ctx=512 = 9.6827 +/- 0.07972
|
| 178 |
|
|
|
|
| 179 |
<details>
|
| 180 |
|
| 181 |
<summary>π Secret Recipe</summary>
|
| 182 |
|
| 183 |
```bash
|
| 184 |
+
#!/usr/bin/env bash
|
| 185 |
+
|
| 186 |
+
custom="
|
| 187 |
+
# 61 Repeating Layers [0-61]
|
| 188 |
+
|
| 189 |
+
# Attention [0-61] GPU
|
| 190 |
+
blk\..*\.attn_q.*=q8_0
|
| 191 |
+
blk\..*\.attn_k.*=q8_0
|
| 192 |
+
blk\..*\.attn_v.*=q8_0
|
| 193 |
+
blk\..*\.attn_output.*=q8_0
|
| 194 |
+
|
| 195 |
+
# Routed Experts Layers [0-61] CPU
|
| 196 |
+
blk\..*\.ffn_down_exps\.weight=iq3_ks
|
| 197 |
+
blk\..*\.ffn_(gate|up)_exps\.weight=iq2_ks
|
| 198 |
+
|
| 199 |
+
# Non-Repeating Layers
|
| 200 |
+
token_embd\.weight=iq4_k
|
| 201 |
+
output\.weight=iq6_k
|
| 202 |
+
"
|
| 203 |
+
|
| 204 |
+
custom=$(
|
| 205 |
+
echo "$custom" | grep -v '^#' | \
|
| 206 |
+
sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
|
| 207 |
+
)
|
| 208 |
+
|
| 209 |
+
numactl -N ${SOCKET} -m ${SOCKET} \
|
| 210 |
+
./build/bin/llama-quantize \
|
| 211 |
+
--custom-q "$custom" \
|
| 212 |
+
--imatrix /mnt/data/models/ubergarm/MiniMax-M2.5-GGUF/imatrix-MiniMax-M2.5-BF16.dat \
|
| 213 |
+
/mnt/data/models/ubergarm/MiniMax-M2.5-GGUF/MiniMax-M2.5-256x4.9B-BF16-00001-of-00010.gguf \
|
| 214 |
+
/mnt/data/models/ubergarm/MiniMax-M2.5-GGUF/MiniMax-M2.5-IQ2_KS.gguf \
|
| 215 |
+
IQ2_KS \
|
| 216 |
+
128
|
| 217 |
```
|
| 218 |
|
| 219 |
</details>
|
images/perplexity.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|