MiniMax-M2.5-Q6_K-GGUF / README.md

thad0ctor

Add files using upload-large-folder tool

9d41769 verified 2 days ago

preview code

raw

history blame

1.11 kB

metadata

base_model: MiniMaxAI/MiniMax-M2.5
base_model_relation: quantized
license: other
license_name: modified-mit
license_link: LICENSE
tags:
  - gguf
  - quantized
  - llama.cpp

MiniMax-M2.5 GGUF

GGUF quantization of MiniMaxAI/MiniMax-M2.5, created with llama.cpp.

Model Details

Property	Value
Base model	MiniMaxAI/MiniMax-M2.5
Architecture	Mixture of Experts (MoE)
Total parameters	230B
Active parameters	10B per token
Layers	62
Total experts	256
Active experts per token	8
Source precision	FP8 (`float8_e4m3fn`)

Available Quantizations

Quantization	Size	Description
Q6_K	175 GB	6-bit K-quant, strong quality/size balance

Usage

These GGUFs can be used with llama.cpp and compatible frontends.

# Example with llama-cli
llama-cli -m MiniMax-M2.5.Q6_K.gguf -p "Hello" -n 128

Notes

The source model uses FP8 (float8_e4m3fn) precision.
This is a large MoE model and requires significant memory.
Quantized from the official MiniMaxAI/MiniMax-M2.5 weights.