MiniMax-M2.5 GGUF
GGUF quantizations of MiniMaxAI/MiniMax-M2.5, created with llama.cpp.
Model Details
| Property | Value |
|---|---|
| Base model | MiniMaxAI/MiniMax-M2.5 |
| Architecture | Mixture of Experts (MoE) |
| Total parameters | 230B |
| Active parameters | 10B per token |
| Layers | 62 |
| Total experts | 256 |
| Active experts per token | 8 |
| Source precision | FP8 (float8_e4m3fn) |
Available Quantizations
| Quantization | Size | Description |
|---|---|---|
| Q8_0 | 227 GB | 8-bit quantization, highest quality |
| Q4_K_M | 129 GB | 4-bit K-quant (medium), good balance of quality and size |
| IQ3_S | 92 GB | 3-bit importance quantization (small), compact |
| Q2_K | 78 GB | 2-bit K-quant, smallest size |
Usage
These GGUFs can be used with llama.cpp and compatible frontends.
# Example with llama-cli
llama-cli -m MiniMax-M2.5-Q4_K_M.gguf -p "Hello" -n 128
Notes
- The source model uses FP8 (
float8_e4m3fn) precision, so Q8_0 is effectively lossless relative to the source weights. - This is a large MoE model. Even the smallest quant (Q2_K) requires ~78GB due to the number of experts.
- Quantized from the official MiniMaxAI/MiniMax-M2.5 weights.
- Downloads last month
- 524
Hardware compatibility
Log In
to add your hardware
2-bit
3-bit
4-bit
8-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for marksverdhei/MiniMax-M2.5-GGUF
Base model
MiniMaxAI/MiniMax-M2.5