bknyaz
/

Qwen3-Coder-Next-REAM

Text Generation

Mixture of Experts

Model card Files Files and versions

bknyaz commited on 7 days ago

Commit

101a1f8

·

verified ·

1 Parent(s): 5bfc54c

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ This model is a compressed version of [Qwen/Qwen3-Coder-Next](https://huggingfac
 It is obtained by reducing the number of experts in each MoE layer from 512 to 384.
 This reduction is achieved by the REAM method described in https://bknyaz.github.io/blog/2026/moe/.
-**Compared to other models obtained in this collection, more coding sequences used in the calibration data during pruning/merging
 to better preserve original's model coding abilities. Specifically, the ratio between c4, math and coding data (see https://bknyaz.github.io/blog/2026/moe/) is 0.0, 0.7, 0.3.
 The calibration data used here is the same as in our [Qwen3-Coder-Next-REAP](https://huggingface.co/SamsungSAILMontreal/Qwen3-Coder-Next-REAP).
 Compared to other REAM models, here we used C=32 (number of experts in groups) instead of C=16, which we found to work better.**

 It is obtained by reducing the number of experts in each MoE layer from 512 to 384.
 This reduction is achieved by the REAM method described in https://bknyaz.github.io/blog/2026/moe/.
+**Compared to other models obtained in this collection, more code data is used in the calibration data during pruning/merging
 to better preserve original's model coding abilities. Specifically, the ratio between c4, math and coding data (see https://bknyaz.github.io/blog/2026/moe/) is 0.0, 0.7, 0.3.
 The calibration data used here is the same as in our [Qwen3-Coder-Next-REAP](https://huggingface.co/SamsungSAILMontreal/Qwen3-Coder-Next-REAP).
 Compared to other REAM models, here we used C=32 (number of experts in groups) instead of C=16, which we found to work better.**