Text Generation
Transformers
Safetensors
step3p5
conversational
custom_code
fp8
WinstonDeng commited on
Commit
1056dee
·
verified ·
1 Parent(s): 2c335f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -80,6 +80,10 @@ Performance of Step 3.5 Flash measured across **Reasoning**, **Coding**, and **A
80
  3. **BrowseComp (with Context Manager)**: When the effective context length exceeds a predefined threshold, the agent resets the context and restarts the agent loop. By contrast, Kimi K2.5 and DeepSeek-V3.2 used a "discard-all" strategy.
81
  4. **Decoding Cost**: Estimates are based on a methodology similar to, but more accurate than, the approach described arxiv.org/abs/2507.19427
82
 
 
 
 
 
83
  ## 4. Architecture Details
84
 
85
  Step 3.5 Flash is built on a **Sparse Mixture-of-Experts (MoE)** transformer architecture, optimized for high throughput and low VRAM usage during inference.
 
80
  3. **BrowseComp (with Context Manager)**: When the effective context length exceeds a predefined threshold, the agent resets the context and restarts the agent loop. By contrast, Kimi K2.5 and DeepSeek-V3.2 used a "discard-all" strategy.
81
  4. **Decoding Cost**: Estimates are based on a methodology similar to, but more accurate than, the approach described arxiv.org/abs/2507.19427
82
 
83
+ ### Recommended Inference Parameters
84
+ 1. For general chat domain, we suggest: `temperature=0.6, top_p=0.95`
85
+ 2. For reasoning / agent scenario, we recommend: `temperature=1.0, top_p=0.95`.
86
+
87
  ## 4. Architecture Details
88
 
89
  Step 3.5 Flash is built on a **Sparse Mixture-of-Experts (MoE)** transformer architecture, optimized for high throughput and low VRAM usage during inference.