Update README.md
Browse files
README.md
CHANGED
|
@@ -16,7 +16,7 @@ This model is a compressed version of [Qwen/Qwen3-Coder-Next](https://huggingfac
|
|
| 16 |
It is obtained by reducing the number of experts in each MoE layer from 512 to 384.
|
| 17 |
This reduction is achieved by the REAM method described in https://bknyaz.github.io/blog/2026/moe/.
|
| 18 |
|
| 19 |
-
**Compared to other models obtained in this collection, more
|
| 20 |
to better preserve original's model coding abilities. Specifically, the ratio between c4, math and coding data (see https://bknyaz.github.io/blog/2026/moe/) is 0.0, 0.7, 0.3.
|
| 21 |
The calibration data used here is the same as in our [Qwen3-Coder-Next-REAP](https://huggingface.co/SamsungSAILMontreal/Qwen3-Coder-Next-REAP).
|
| 22 |
Compared to other REAM models, here we used C=32 (number of experts in groups) instead of C=16, which we found to work better.**
|
|
|
|
| 16 |
It is obtained by reducing the number of experts in each MoE layer from 512 to 384.
|
| 17 |
This reduction is achieved by the REAM method described in https://bknyaz.github.io/blog/2026/moe/.
|
| 18 |
|
| 19 |
+
**Compared to other models obtained in this collection, more code data is used in the calibration data during pruning/merging
|
| 20 |
to better preserve original's model coding abilities. Specifically, the ratio between c4, math and coding data (see https://bknyaz.github.io/blog/2026/moe/) is 0.0, 0.7, 0.3.
|
| 21 |
The calibration data used here is the same as in our [Qwen3-Coder-Next-REAP](https://huggingface.co/SamsungSAILMontreal/Qwen3-Coder-Next-REAP).
|
| 22 |
Compared to other REAM models, here we used C=32 (number of experts in groups) instead of C=16, which we found to work better.**
|