XiaomiMiMo
/

MiMo-V2-Flash

@@ -1,6 +1,10 @@
 ---
-license: mit
 library_name: transformers
 ---
 <br/><br/>
@@ -18,7 +22,7 @@ library_name: transformers
   |
   <a href="https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash" target="_blank">🤗 HuggingFace</a>
   &nbsp;|
-  <a href="https://github.com/XiaomiMiMo/MiMo-V2-Flash/blob/main/paper.pdf" target="_blank">📔 Technical Report </a>
   &nbsp;|
   <a href="https://mimo.xiaomi.com/blog/mimo-v2-flash" target="_blank">📰 Blog </a>
   &nbsp;|
@@ -141,7 +145,7 @@ Following our Post-Training Paradigm with MOPD and Agentic RL, the model achieve
 | **General Agent**              |               |                  |                        |                |                   |            |
 | BrowseComp                     |     45.4      |        -         |          51.4          |       -        |       24.1        |    54.9    |
 | BrowseComp (w/ Context Manage) |     58.3      |       60.2       |          67.6          |      59.2      |         -         |     -      |
-| \\(\tau^2\\)-Bench                 |     80.3      |       74.3       |          80.3          |      85.4      |       84.7        |    80.2    |
 -----
@@ -155,7 +159,7 @@ Following our Post-Training Paradigm with MOPD and Agentic RL, the model achieve
 MiMo-V2-Flash addresses the quadratic complexity of long contexts by interleaving Local Sliding Window Attention (SWA) and Global Attention (GA).
-  * **Configuration**: Stacks of \\(M=8\\) hybrid blocks. Each block contains \\(N=5\\) SWA layers followed by 1 GA layer.
   * **Efficiency**: SWA layers use a window size of 128 tokens, reducing KV cache significantly.
   * **Sink Bias**: Learnable attention sink bias is applied to maintain performance despite the aggressive window size.
@@ -202,11 +206,15 @@ MiMo-V2-Flash supports FP8 mixed precision inference. We recommend using **SGLan
 ### Quick Start with SGLang
 ```bash
-pip install sglang
-# Launch server
-python3 -m sglang.launch_server \
         --model-path XiaomiMiMo/MiMo-V2-Flash \
         --served-model-name mimo-v2-flash \
         --pp-size 1 \
@@ -300,7 +308,7 @@ If you find our work helpful, please cite our technical report:
   title={MiMo-V2-Flash Technical Report},
   author={LLM-Core Xiaomi},
   year={2025},
-  url={https://github.com/XiaomiMiMo/MiMo-V2-Flash/paper.pdf}
 }
 ```

 ---
 library_name: transformers
+license: mit
+pipeline_tag: text-generation
+tags:
+- moe
+- mixture-of-experts
 ---
 <br/><br/>
   |
   <a href="https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash" target="_blank">🤗 HuggingFace</a>
   &nbsp;|
+  <a href="https://huggingface.co/papers/2601.02780" target="_blank">📔 Technical Report </a>
   &nbsp;|
   <a href="https://mimo.xiaomi.com/blog/mimo-v2-flash" target="_blank">📰 Blog </a>
   &nbsp;|
 | **General Agent**              |               |                  |                        |                |                   |            |
 | BrowseComp                     |     45.4      |        -         |          51.4          |       -        |       24.1        |    54.9    |
 | BrowseComp (w/ Context Manage) |     58.3      |       60.2       |          67.6          |      59.2      |         -         |     -      |
+| \(\tau^2\)-Bench                 |     80.3      |       74.3       |          80.3          |      85.4      |       84.7        |    80.2    |
 -----
 MiMo-V2-Flash addresses the quadratic complexity of long contexts by interleaving Local Sliding Window Attention (SWA) and Global Attention (GA).
+  * **Configuration**: Stacks of $M=8$ hybrid blocks. Each block contains $N=5$ SWA layers followed by 1 GA layer.
   * **Efficiency**: SWA layers use a window size of 128 tokens, reducing KV cache significantly.
   * **Sink Bias**: Learnable attention sink bias is applied to maintain performance despite the aggressive window size.
 ### Quick Start with SGLang
+Following https://lmsys.org/blog/2025-12-16-mimo-v2-flash/, please use the compatible SGLang version as follows.
 ```bash
+pip install sglang==0.5.6.post2.dev8005+pr.15207.g39d5bd57a \
+  --index-url https://sgl-project.github.io/whl/pr/ \
+  --extra-index-url https://pypi.org/simple
+#Launch the server
+SGLANG_ENABLE_SPEC_V2=1 python3 -m sglang.launch_server \
         --model-path XiaomiMiMo/MiMo-V2-Flash \
         --served-model-name mimo-v2-flash \
         --pp-size 1 \
   title={MiMo-V2-Flash Technical Report},
   author={LLM-Core Xiaomi},
   year={2025},
+  url={https://huggingface.co/papers/2601.02780}
 }
 ```