MiniMax-M2.5 GGUF

GGUF quantizations of MiniMaxAI/MiniMax-M2.5, created with llama.cpp.

Model Details

Quantization	Size	Description
Q8_0	227 GB	8-bit quantization, highest quality
Q4_K_M	129 GB	4-bit K-quant (medium), good balance of quality and size
IQ3_S	92 GB	3-bit importance quantization (small), compact
Q2_K	78 GB	2-bit K-quant, smallest size

These GGUFs can be used with llama.cpp and compatible frontends.

# Example with llama-cli
llama-cli -m MiniMax-M2.5-Q4_K_M.gguf -p "Hello" -n 128

The source model uses FP8 (float8_e4m3fn) precision, so Q8_0 is effectively lossless relative to the source weights.
This is a large MoE model. Even the smallest quant (Q2_K) requires ~78GB due to the number of experts.
Quantized from the official MiniMaxAI/MiniMax-M2.5 weights.

GGUF

Model size

229B params

Architecture

minimax-m2

Hardware compatibility

2-bit

3-bit

4-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Quantized

(29)

this model