Add pipeline_tag and link to technical report
#24
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,6 +1,10 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
library_name: transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
<br/><br/>
|
|
@@ -18,7 +22,7 @@ library_name: transformers
|
|
| 18 |
|
|
| 19 |
<a href="https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash" target="_blank">π€ HuggingFace</a>
|
| 20 |
|
|
| 21 |
-
<a href="https://
|
| 22 |
|
|
| 23 |
<a href="https://mimo.xiaomi.com/blog/mimo-v2-flash" target="_blank">π° Blog </a>
|
| 24 |
|
|
|
@@ -141,7 +145,7 @@ Following our Post-Training Paradigm with MOPD and Agentic RL, the model achieve
|
|
| 141 |
| **General Agent** | | | | | | |
|
| 142 |
| BrowseComp | 45.4 | - | 51.4 | - | 24.1 | 54.9 |
|
| 143 |
| BrowseComp (w/ Context Manage) | 58.3 | 60.2 | 67.6 | 59.2 | - | - |
|
| 144 |
-
|
|
| 145 |
|
| 146 |
-----
|
| 147 |
|
|
@@ -155,7 +159,7 @@ Following our Post-Training Paradigm with MOPD and Agentic RL, the model achieve
|
|
| 155 |
|
| 156 |
MiMo-V2-Flash addresses the quadratic complexity of long contexts by interleaving Local Sliding Window Attention (SWA) and Global Attention (GA).
|
| 157 |
|
| 158 |
-
* **Configuration**: Stacks of
|
| 159 |
* **Efficiency**: SWA layers use a window size of 128 tokens, reducing KV cache significantly.
|
| 160 |
* **Sink Bias**: Learnable attention sink bias is applied to maintain performance despite the aggressive window size.
|
| 161 |
|
|
@@ -202,11 +206,15 @@ MiMo-V2-Flash supports FP8 mixed precision inference. We recommend using **SGLan
|
|
| 202 |
|
| 203 |
### Quick Start with SGLang
|
| 204 |
|
|
|
|
|
|
|
| 205 |
```bash
|
| 206 |
-
pip install sglang
|
|
|
|
|
|
|
| 207 |
|
| 208 |
-
#
|
| 209 |
-
python3 -m sglang.launch_server \
|
| 210 |
--model-path XiaomiMiMo/MiMo-V2-Flash \
|
| 211 |
--served-model-name mimo-v2-flash \
|
| 212 |
--pp-size 1 \
|
|
@@ -300,7 +308,7 @@ If you find our work helpful, please cite our technical report:
|
|
| 300 |
title={MiMo-V2-Flash Technical Report},
|
| 301 |
author={LLM-Core Xiaomi},
|
| 302 |
year={2025},
|
| 303 |
-
url={https://
|
| 304 |
}
|
| 305 |
```
|
| 306 |
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
+
license: mit
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
tags:
|
| 6 |
+
- moe
|
| 7 |
+
- mixture-of-experts
|
| 8 |
---
|
| 9 |
|
| 10 |
<br/><br/>
|
|
|
|
| 22 |
|
|
| 23 |
<a href="https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash" target="_blank">π€ HuggingFace</a>
|
| 24 |
|
|
| 25 |
+
<a href="https://huggingface.co/papers/2601.02780" target="_blank">π Technical Report </a>
|
| 26 |
|
|
| 27 |
<a href="https://mimo.xiaomi.com/blog/mimo-v2-flash" target="_blank">π° Blog </a>
|
| 28 |
|
|
|
|
|
| 145 |
| **General Agent** | | | | | | |
|
| 146 |
| BrowseComp | 45.4 | - | 51.4 | - | 24.1 | 54.9 |
|
| 147 |
| BrowseComp (w/ Context Manage) | 58.3 | 60.2 | 67.6 | 59.2 | - | - |
|
| 148 |
+
| \(\tau^2\)-Bench | 80.3 | 74.3 | 80.3 | 85.4 | 84.7 | 80.2 |
|
| 149 |
|
| 150 |
-----
|
| 151 |
|
|
|
|
| 159 |
|
| 160 |
MiMo-V2-Flash addresses the quadratic complexity of long contexts by interleaving Local Sliding Window Attention (SWA) and Global Attention (GA).
|
| 161 |
|
| 162 |
+
* **Configuration**: Stacks of $M=8$ hybrid blocks. Each block contains $N=5$ SWA layers followed by 1 GA layer.
|
| 163 |
* **Efficiency**: SWA layers use a window size of 128 tokens, reducing KV cache significantly.
|
| 164 |
* **Sink Bias**: Learnable attention sink bias is applied to maintain performance despite the aggressive window size.
|
| 165 |
|
|
|
|
| 206 |
|
| 207 |
### Quick Start with SGLang
|
| 208 |
|
| 209 |
+
Following https://lmsys.org/blog/2025-12-16-mimo-v2-flash/, please use the compatible SGLang version as follows.
|
| 210 |
+
|
| 211 |
```bash
|
| 212 |
+
pip install sglang==0.5.6.post2.dev8005+pr.15207.g39d5bd57a \
|
| 213 |
+
--index-url https://sgl-project.github.io/whl/pr/ \
|
| 214 |
+
--extra-index-url https://pypi.org/simple
|
| 215 |
|
| 216 |
+
#Launch the server
|
| 217 |
+
SGLANG_ENABLE_SPEC_V2=1 python3 -m sglang.launch_server \
|
| 218 |
--model-path XiaomiMiMo/MiMo-V2-Flash \
|
| 219 |
--served-model-name mimo-v2-flash \
|
| 220 |
--pp-size 1 \
|
|
|
|
| 308 |
title={MiMo-V2-Flash Technical Report},
|
| 309 |
author={LLM-Core Xiaomi},
|
| 310 |
year={2025},
|
| 311 |
+
url={https://huggingface.co/papers/2601.02780}
|
| 312 |
}
|
| 313 |
```
|
| 314 |
|