BigDong commited on
Commit
4169e9a
·
1 Parent(s): b4d7b0e

update readme

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -50,7 +50,9 @@ MiniCPM-SALA is an efficient hybrid model in which 25% of the layers adopt [InfL
50
  - **Efficient Inference on Long Sequences**
51
  - Achieves up to 3.5x the inference speed of Qwen3-8B at a sequence length of 256K tokens on A6000D, supports inference at context lengths of up to 1M tokens on both NVIDIA A6000D and 5090 GPUs, whereas Qwen3-8B fails at this length due to out-of-memory (OOM) errors.
52
 
53
- ## Usage
 
 
54
 
55
  ### HuggingFace
56
 
 
50
  - **Efficient Inference on Long Sequences**
51
  - Achieves up to 3.5x the inference speed of Qwen3-8B at a sequence length of 256K tokens on A6000D, supports inference at context lengths of up to 1M tokens on both NVIDIA A6000D and 5090 GPUs, whereas Qwen3-8B fails at this length due to out-of-memory (OOM) errors.
52
 
53
+ ## Inference
54
+
55
+ To achieve optimal performance, we recommend using `Temperature=0.9`.
56
 
57
  ### HuggingFace
58