| base_model: MiniMaxAI/MiniMax-M2.5 | |
| base_model_relation: quantized | |
| license: other | |
| license_name: modified-mit | |
| license_link: LICENSE | |
| tags: | |
| - gguf | |
| - quantized | |
| - llama.cpp | |
| # MiniMax-M2.5 GGUF | |
| GGUF quantization of `MiniMaxAI/MiniMax-M2.5`, created with `llama.cpp`. | |
| ## Model Details | |
| | Property | Value | | |
| | --- | --- | | |
| | Base model | MiniMaxAI/MiniMax-M2.5 | | |
| | Architecture | Mixture of Experts (MoE) | | |
| | Total parameters | 230B | | |
| | Active parameters | 10B per token | | |
| | Layers | 62 | | |
| | Total experts | 256 | | |
| | Active experts per token | 8 | | |
| | Source precision | FP8 (`float8_e4m3fn`) | | |
| ## Available Quantizations | |
| | Quantization | Size | Description | | |
| | --- | --- | --- | | |
| | Q6_K | 175 GB | 6-bit K-quant, strong quality/size balance | | |
| ## Usage | |
| These GGUFs can be used with `llama.cpp` and compatible frontends. | |
| ```bash | |
| # Example with llama-cli | |
| llama-cli -m MiniMax-M2.5.Q6_K.gguf -p "Hello" -n 128 | |
| ``` | |
| ## Notes | |
| - The source model uses FP8 (`float8_e4m3fn`) precision. | |
| - This is a large MoE model and requires significant memory. | |
| - Quantized from the official `MiniMaxAI/MiniMax-M2.5` weights. | |