thad0ctor's picture
Add files using upload-large-folder tool
9d41769 verified
|
raw
history blame
1.11 kB
metadata
base_model: MiniMaxAI/MiniMax-M2.5
base_model_relation: quantized
license: other
license_name: modified-mit
license_link: LICENSE
tags:
  - gguf
  - quantized
  - llama.cpp

MiniMax-M2.5 GGUF

GGUF quantization of MiniMaxAI/MiniMax-M2.5, created with llama.cpp.

Model Details

Property Value
Base model MiniMaxAI/MiniMax-M2.5
Architecture Mixture of Experts (MoE)
Total parameters 230B
Active parameters 10B per token
Layers 62
Total experts 256
Active experts per token 8
Source precision FP8 (float8_e4m3fn)

Available Quantizations

Quantization Size Description
Q6_K 175 GB 6-bit K-quant, strong quality/size balance

Usage

These GGUFs can be used with llama.cpp and compatible frontends.

# Example with llama-cli
llama-cli -m MiniMax-M2.5.Q6_K.gguf -p "Hello" -n 128

Notes

  • The source model uses FP8 (float8_e4m3fn) precision.
  • This is a large MoE model and requires significant memory.
  • Quantized from the official MiniMaxAI/MiniMax-M2.5 weights.