MoE-quants of GLM-5 (Q8_0 quantization default with routed experts quantized further)

Note: running this GGUF requires pulling and compiling this llama.cpp PR: https://github.com/ggml-org/llama.cpp/pull/19460

More quants to come soon.

Quant Size Mixture PPL KLD
Q4_K_M 432.80 GiB (4.93 BPW) Q8_0-Q4_K-Q4_K-Q5_K 8.7486 ± 0.17123 TBD
Downloads last month
43
GGUF
Model size
754B params
Architecture
glm-dsa
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AesSedai/GLM-5-GGUF

Base model

zai-org/GLM-5
Quantized
(11)
this model