bknyaz commited on
Commit
101a1f8
·
verified ·
1 Parent(s): 5bfc54c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -16,7 +16,7 @@ This model is a compressed version of [Qwen/Qwen3-Coder-Next](https://huggingfac
16
  It is obtained by reducing the number of experts in each MoE layer from 512 to 384.
17
  This reduction is achieved by the REAM method described in https://bknyaz.github.io/blog/2026/moe/.
18
 
19
- **Compared to other models obtained in this collection, more coding sequences used in the calibration data during pruning/merging
20
  to better preserve original's model coding abilities. Specifically, the ratio between c4, math and coding data (see https://bknyaz.github.io/blog/2026/moe/) is 0.0, 0.7, 0.3.
21
  The calibration data used here is the same as in our [Qwen3-Coder-Next-REAP](https://huggingface.co/SamsungSAILMontreal/Qwen3-Coder-Next-REAP).
22
  Compared to other REAM models, here we used C=32 (number of experts in groups) instead of C=16, which we found to work better.**
 
16
  It is obtained by reducing the number of experts in each MoE layer from 512 to 384.
17
  This reduction is achieved by the REAM method described in https://bknyaz.github.io/blog/2026/moe/.
18
 
19
+ **Compared to other models obtained in this collection, more code data is used in the calibration data during pruning/merging
20
  to better preserve original's model coding abilities. Specifically, the ratio between c4, math and coding data (see https://bknyaz.github.io/blog/2026/moe/) is 0.0, 0.7, 0.3.
21
  The calibration data used here is the same as in our [Qwen3-Coder-Next-REAP](https://huggingface.co/SamsungSAILMontreal/Qwen3-Coder-Next-REAP).
22
  Compared to other REAM models, here we used C=32 (number of experts in groups) instead of C=16, which we found to work better.**