thad0ctor
/

MiniMax-M2.5-Q6_K-GGUF

Model card Files Files and versions

MiniMax-M2.5-Q6_K-GGUF / README.md

thad0ctor's picture

Add files using upload-large-folder tool

9d41769 verified 3 days ago

|

history blame contribute delete

1.11 kB

	---
	base_model: MiniMaxAI/MiniMax-M2.5
	base_model_relation: quantized
	license: other
	license_name: modified-mit
	license_link: LICENSE
	tags:
	- gguf
	- quantized
	- llama.cpp
	---

	# MiniMax-M2.5 GGUF

	GGUF quantization of `MiniMaxAI/MiniMax-M2.5`, created with `llama.cpp`.

	## Model Details

	\| Property \| Value \|
	\| --- \| --- \|
	\| Base model \| MiniMaxAI/MiniMax-M2.5 \|
	\| Architecture \| Mixture of Experts (MoE) \|
	\| Total parameters \| 230B \|
	\| Active parameters \| 10B per token \|
	\| Layers \| 62 \|
	\| Total experts \| 256 \|
	\| Active experts per token \| 8 \|
	\| Source precision \| FP8 (`float8_e4m3fn`) \|

	## Available Quantizations

	\| Quantization \| Size \| Description \|
	\| --- \| --- \| --- \|
	\| Q6_K \| 175 GB \| 6-bit K-quant, strong quality/size balance \|

	## Usage

	These GGUFs can be used with `llama.cpp` and compatible frontends.

	```bash
	# Example with llama-cli
	llama-cli -m MiniMax-M2.5.Q6_K.gguf -p "Hello" -n 128
	```

	## Notes

	- The source model uses FP8 (`float8_e4m3fn`) precision.
	- This is a large MoE model and requires significant memory.
	- Quantized from the official `MiniMaxAI/MiniMax-M2.5` weights.