stepfun-ai
/

Step-3.5-Flash-FP8

Text Generation

Model card Files Files and versions

WinstonDeng commited on 2 days ago

Commit

1056dee

·

verified ·

1 Parent(s): 2c335f3

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -80,6 +80,10 @@ Performance of Step 3.5 Flash measured across **Reasoning**, **Coding**, and **A
 3. **BrowseComp (with Context Manager)**: When the effective context length exceeds a predefined threshold, the agent resets the context and restarts the agent loop. By contrast, Kimi K2.5 and DeepSeek-V3.2 used a "discard-all" strategy.
 4. **Decoding Cost**: Estimates are based on a methodology similar to, but more accurate than, the approach described arxiv.org/abs/2507.19427
 ## 4. Architecture Details
 Step 3.5 Flash is built on a **Sparse Mixture-of-Experts (MoE)** transformer architecture, optimized for high throughput and low VRAM usage during inference.

 3. **BrowseComp (with Context Manager)**: When the effective context length exceeds a predefined threshold, the agent resets the context and restarts the agent loop. By contrast, Kimi K2.5 and DeepSeek-V3.2 used a "discard-all" strategy.
 4. **Decoding Cost**: Estimates are based on a methodology similar to, but more accurate than, the approach described arxiv.org/abs/2507.19427
+### Recommended Inference Parameters
+1. For general chat domain, we suggest: `temperature=0.6, top_p=0.95`
+2. For reasoning / agent scenario, we recommend: `temperature=1.0, top_p=0.95`.
 ## 4. Architecture Details
 Step 3.5 Flash is built on a **Sparse Mixture-of-Experts (MoE)** transformer architecture, optimized for high throughput and low VRAM usage during inference.