ubergarm commited on
Commit
263adb3
·
0 Parent(s):

initial commit

Browse files
Files changed (2) hide show
  1. .gitattributes +38 -0
  2. README.md +93 -0
.gitattributes ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ imatrix-*.dat filter=lfs diff=lfs merge=lfs -text
37
+ *.gguf filter=lfs diff=lfs merge=lfs -text
38
+ *.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ quantized_by: ubergarm
3
+ pipeline_tag: text-generation
4
+ base_model: MiniMaxAI/MiniMax-M2.5
5
+ base_model_relation: quantized
6
+ license_name: modified-mit
7
+ license_link: https://github.com/MiniMax-AI/MiniMax-M2.5/blob/main/LICENSE
8
+ tags:
9
+ - imatrix
10
+ - conversational
11
+ - minimax_m2
12
+ - ik_llama.cpp
13
+ ---
14
+
15
+ ## WIP
16
+
17
+ - [x] download and convert_hf_to_gguyf.py (automatically casts fp8 to bf16)
18
+ - [ ] imatrix computation on full upcast bf16 gguf
19
+ - [ ] upload imatrix dat for others to do custom quantizations
20
+ - [ ] upload some logs/ files so others can see commands and early perplexity values for partial runs
21
+ - [ ] cook q8_0 and test perplexity
22
+ - [ ] fugure out initial custom quantization recipes
23
+ - [ ] release IQ5_K
24
+ - [ ] release smaller quants prioritizing based on what discussions are opened
25
+ - [ ] release graph of perplexity across this quant collection
26
+
27
+ ## `ik_llama.cpp` imatrix Quantizations of MiniMaxAI/MiniMax-M2.5
28
+ *NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
29
+
30
+ Some of ik's new quants are supported with [Nexesenex/croco.cpp](https://github.com/Nexesenex/croco.cpp) fork of KoboldCPP with Windows builds for CUDA 12.9. Also check for [Windows builds by Thireus here.](https://github.com/Thireus/ik_llama.cpp/releases) which have been CUDA 12.8.
31
+
32
+ These quants provide best in class perplexity for the given memory footprint.
33
+
34
+ ## Big Thanks
35
+ Shout out to Wendell and the **Level1Techs** crew, the community [Forums](https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-home/225826), [YouTube Channel](https://www.youtube.com/@Level1Techs)! **BIG thanks** for providing **BIG hardware** expertise and access to run these experiments and make these great quants available to the community!!!
36
+
37
+ Also thanks to all the folks in the quanting and inferencing community on [BeaverAI Club Discord](https://huggingface.co/BeaverAI) and on [r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/) for tips and tricks helping each other run, test, and benchmark all the fun new models! Thanks to huggingface for hosting all these big quants!
38
+
39
+ Finally, I *really* appreciate the support from [aifoundry.org](https://aifoundry.org) so check out their open source RISC-V based solutions!
40
+
41
+ ## Quant Collection
42
+ Perplexity computed against *wiki.test.raw*. (lower is "better")
43
+
44
+ ![Perplexity Chart](images/perplexity.png "Chart showing Perplexity vs Model Size.")
45
+
46
+ These two are just a test quants for baseline perplexity comparison and not available for download here:
47
+ * `BF16` TODO
48
+ - TODO
49
+ * `Q8_0` TODO
50
+ - TODO
51
+
52
+ *NOTE*: The first split file is much smaller on purpose to only contain metadata, its fine!
53
+
54
+ ## IQ5_K TODO
55
+ TODO
56
+
57
+ <details>
58
+
59
+ <summary>👈 Secret Recipe</summary>
60
+
61
+ ```bash
62
+ echo TODO
63
+ ```
64
+
65
+ </details>
66
+
67
+ ## Quick Start
68
+
69
+ ```bash
70
+ # Clone and checkout
71
+ $ git clone https://github.com/ikawrakow/ik_llama.cpp
72
+ $ cd ik_llama.cpp
73
+
74
+ # Build for hybrid CPU+CUDA
75
+ $ cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
76
+ $ cmake --build build --config Release -j $(nproc)
77
+
78
+ # Hybrid CPU and Single GPU
79
+ echo TODO
80
+
81
+ # Hybrid CPU and Multi GPU
82
+ echo TODO
83
+
84
+ # CPU-Only
85
+ echo TODO
86
+ ```
87
+
88
+ For tool use you can always bring your own template with `--chat-template-file myTemplate.jinja` and might need `--special` etc.
89
+
90
+ ## References
91
+ * [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)
92
+ * [Getting Started Guide (already out of date lol)](https://github.com/ikawrakow/ik_llama.cpp/discussions/258)
93
+ * [ubergarm-imatrix-calibration-corpus-v02.txt](https://gist.github.com/ubergarm/edfeb3ff9c6ec8b49e88cdf627b0711a?permalink_comment_id=5682584#gistcomment-5682584)