Add pipeline_tag and link to technical report

#24
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +16 -8
README.md CHANGED
@@ -1,6 +1,10 @@
1
  ---
2
- license: mit
3
  library_name: transformers
 
 
 
 
 
4
  ---
5
 
6
  <br/><br/>
@@ -18,7 +22,7 @@ library_name: transformers
18
  |
19
  <a href="https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash" target="_blank">πŸ€— HuggingFace</a>
20
  &nbsp;|
21
- <a href="https://github.com/XiaomiMiMo/MiMo-V2-Flash/blob/main/paper.pdf" target="_blank">πŸ“” Technical Report </a>
22
  &nbsp;|
23
  <a href="https://mimo.xiaomi.com/blog/mimo-v2-flash" target="_blank">πŸ“° Blog </a>
24
  &nbsp;|
@@ -141,7 +145,7 @@ Following our Post-Training Paradigm with MOPD and Agentic RL, the model achieve
141
  | **General Agent** | | | | | | |
142
  | BrowseComp | 45.4 | - | 51.4 | - | 24.1 | 54.9 |
143
  | BrowseComp (w/ Context Manage) | 58.3 | 60.2 | 67.6 | 59.2 | - | - |
144
- | \\(\tau^2\\)-Bench | 80.3 | 74.3 | 80.3 | 85.4 | 84.7 | 80.2 |
145
 
146
  -----
147
 
@@ -155,7 +159,7 @@ Following our Post-Training Paradigm with MOPD and Agentic RL, the model achieve
155
 
156
  MiMo-V2-Flash addresses the quadratic complexity of long contexts by interleaving Local Sliding Window Attention (SWA) and Global Attention (GA).
157
 
158
- * **Configuration**: Stacks of \\(M=8\\) hybrid blocks. Each block contains \\(N=5\\) SWA layers followed by 1 GA layer.
159
  * **Efficiency**: SWA layers use a window size of 128 tokens, reducing KV cache significantly.
160
  * **Sink Bias**: Learnable attention sink bias is applied to maintain performance despite the aggressive window size.
161
 
@@ -202,11 +206,15 @@ MiMo-V2-Flash supports FP8 mixed precision inference. We recommend using **SGLan
202
 
203
  ### Quick Start with SGLang
204
 
 
 
205
  ```bash
206
- pip install sglang
 
 
207
 
208
- # Launch server
209
- python3 -m sglang.launch_server \
210
  --model-path XiaomiMiMo/MiMo-V2-Flash \
211
  --served-model-name mimo-v2-flash \
212
  --pp-size 1 \
@@ -300,7 +308,7 @@ If you find our work helpful, please cite our technical report:
300
  title={MiMo-V2-Flash Technical Report},
301
  author={LLM-Core Xiaomi},
302
  year={2025},
303
- url={https://github.com/XiaomiMiMo/MiMo-V2-Flash/paper.pdf}
304
  }
305
  ```
306
 
 
1
  ---
 
2
  library_name: transformers
3
+ license: mit
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - moe
7
+ - mixture-of-experts
8
  ---
9
 
10
  <br/><br/>
 
22
  |
23
  <a href="https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash" target="_blank">πŸ€— HuggingFace</a>
24
  &nbsp;|
25
+ <a href="https://huggingface.co/papers/2601.02780" target="_blank">πŸ“” Technical Report </a>
26
  &nbsp;|
27
  <a href="https://mimo.xiaomi.com/blog/mimo-v2-flash" target="_blank">πŸ“° Blog </a>
28
  &nbsp;|
 
145
  | **General Agent** | | | | | | |
146
  | BrowseComp | 45.4 | - | 51.4 | - | 24.1 | 54.9 |
147
  | BrowseComp (w/ Context Manage) | 58.3 | 60.2 | 67.6 | 59.2 | - | - |
148
+ | \(\tau^2\)-Bench | 80.3 | 74.3 | 80.3 | 85.4 | 84.7 | 80.2 |
149
 
150
  -----
151
 
 
159
 
160
  MiMo-V2-Flash addresses the quadratic complexity of long contexts by interleaving Local Sliding Window Attention (SWA) and Global Attention (GA).
161
 
162
+ * **Configuration**: Stacks of $M=8$ hybrid blocks. Each block contains $N=5$ SWA layers followed by 1 GA layer.
163
  * **Efficiency**: SWA layers use a window size of 128 tokens, reducing KV cache significantly.
164
  * **Sink Bias**: Learnable attention sink bias is applied to maintain performance despite the aggressive window size.
165
 
 
206
 
207
  ### Quick Start with SGLang
208
 
209
+ Following https://lmsys.org/blog/2025-12-16-mimo-v2-flash/, please use the compatible SGLang version as follows.
210
+
211
  ```bash
212
+ pip install sglang==0.5.6.post2.dev8005+pr.15207.g39d5bd57a \
213
+ --index-url https://sgl-project.github.io/whl/pr/ \
214
+ --extra-index-url https://pypi.org/simple
215
 
216
+ #Launch the server
217
+ SGLANG_ENABLE_SPEC_V2=1 python3 -m sglang.launch_server \
218
  --model-path XiaomiMiMo/MiMo-V2-Flash \
219
  --served-model-name mimo-v2-flash \
220
  --pp-size 1 \
 
308
  title={MiMo-V2-Flash Technical Report},
309
  author={LLM-Core Xiaomi},
310
  year={2025},
311
+ url={https://huggingface.co/papers/2601.02780}
312
  }
313
  ```
314