littlebird13 commited on
Commit
144e4ba
·
verified ·
1 Parent(s): b85cca4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -9
README.md CHANGED
@@ -106,26 +106,20 @@ For deployment, you can use the latest `sglang` or `vllm` to create an OpenAI-co
106
  [SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models.
107
  SGLang could be used to launch a server with OpenAI-compatible API service.
108
 
109
- `sglang>=0.5.2` is required for Qwen3-Next, which can be installed using:
110
  ```shell
111
- pip install 'sglang[all]>=0.5.2'
112
  ```
113
  See [its documentation](https://docs.sglang.ai/get_started/install.html) for more details.
114
 
115
  The following command can be used to create an API endpoint at `http://localhost:30000/v1` with maximum context length 256K tokens using tensor parallel on 4 GPUs.
116
  ```shell
117
- python -m sglang.launch_server --model-path Qwen/Qwen3-Next-80B-A3B-Instruct --port 30000 --tp-size 4 --context-length 262144 --mem-fraction-static 0.8
118
- ```
119
 
120
- The following command is recommended for MTP with the rest settings the same as above:
121
- ```shell
122
- python -m sglang.launch_server --model-path Qwen/Qwen3-Next-80B-A3B-Instruct --port 30000 --tp-size 4 --context-length 262144 --mem-fraction-static 0.8 --speculative-algo NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4
123
- ```
124
 
125
  > [!Note]
126
  > The default context length is 256K. Consider reducing the context length to a smaller value, e.g., `32768`, if the server fails to start.
127
 
128
- Please also refer to SGLang's usage guide on [Qwen3-Next](https://docs.sglang.ai/basic_usage/qwen3.html).
129
 
130
  ### vLLM
131
 
 
106
  [SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models.
107
  SGLang could be used to launch a server with OpenAI-compatible API service.
108
 
109
+ `sglang>=v0.5.8` is required for Qwen3-Coder-Next, which can be installed using:
110
  ```shell
111
+ pip install 'sglang[all]>=v0.5.8'
112
  ```
113
  See [its documentation](https://docs.sglang.ai/get_started/install.html) for more details.
114
 
115
  The following command can be used to create an API endpoint at `http://localhost:30000/v1` with maximum context length 256K tokens using tensor parallel on 4 GPUs.
116
  ```shell
117
+ python -m sglang.launch_server --model Qwen/Qwen3-Coder-Next --tp-size 2 --tool-call-parser qwen3_coder```
 
118
 
 
 
 
 
119
 
120
  > [!Note]
121
  > The default context length is 256K. Consider reducing the context length to a smaller value, e.g., `32768`, if the server fails to start.
122
 
 
123
 
124
  ### vLLM
125