openbmb
/

MiniCPM-SALA

Text Generation

Model card Files Files and versions

BigDong commited on 7 days ago

Commit

4169e9a

·

1 Parent(s): b4d7b0e

update readme

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -50,7 +50,9 @@ MiniCPM-SALA is an efficient hybrid model in which 25% of the layers adopt [InfL
 - **Efficient Inference on Long Sequences**
   - Achieves up to 3.5x the inference speed of Qwen3-8B at a sequence length of 256K tokens on A6000D, supports inference at context lengths of up to 1M tokens on both NVIDIA A6000D and 5090 GPUs, whereas Qwen3-8B fails at this length due to out-of-memory (OOM) errors.
-## Usage
 ### HuggingFace

 - **Efficient Inference on Long Sequences**
   - Achieves up to 3.5x the inference speed of Qwen3-8B at a sequence length of 256K tokens on A6000D, supports inference at context lengths of up to 1M tokens on both NVIDIA A6000D and 5090 GPUs, whereas Qwen3-8B fails at this length due to out-of-memory (OOM) errors.
+## Inference
+To achieve optimal performance, we recommend using `Temperature=0.9`.
 ### HuggingFace