Qwen
/

Qwen3-Coder-Next

Text Generation

Model card Files Files and versions

littlebird13 commited on 14 days ago

Commit

144e4ba

·

verified ·

1 Parent(s): b85cca4

Update README.md

Files changed (1) hide show

README.md +3 -9

README.md CHANGED Viewed

@@ -106,26 +106,20 @@ For deployment, you can use the latest `sglang` or `vllm` to create an OpenAI-co
 [SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models.
 SGLang could be used to launch a server with OpenAI-compatible API service.
-`sglang>=0.5.2` is required for Qwen3-Next, which can be installed using:
 ```shell
-pip install 'sglang[all]>=0.5.2'
 ```
 See [its documentation](https://docs.sglang.ai/get_started/install.html) for more details.
 The following command can be used to create an API endpoint at `http://localhost:30000/v1` with maximum context length 256K tokens using tensor parallel on 4 GPUs.
 ```shell
-python -m sglang.launch_server --model-path Qwen/Qwen3-Next-80B-A3B-Instruct --port 30000 --tp-size 4 --context-length 262144 --mem-fraction-static 0.8
-```
-The following command is recommended for MTP with the rest settings the same as above:
-```shell
-python -m sglang.launch_server --model-path Qwen/Qwen3-Next-80B-A3B-Instruct --port 30000 --tp-size 4 --context-length 262144 --mem-fraction-static 0.8 --speculative-algo NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4
-```
 > [!Note]
 > The default context length is 256K. Consider reducing the context length to a smaller value, e.g., `32768`, if the server fails to start.
-Please also refer to SGLang's usage guide on [Qwen3-Next](https://docs.sglang.ai/basic_usage/qwen3.html).
 ### vLLM

 [SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models.
 SGLang could be used to launch a server with OpenAI-compatible API service.
+`sglang>=v0.5.8` is required for Qwen3-Coder-Next, which can be installed using:
 ```shell
+pip install 'sglang[all]>=v0.5.8'
 ```
 See [its documentation](https://docs.sglang.ai/get_started/install.html) for more details.
 The following command can be used to create an API endpoint at `http://localhost:30000/v1` with maximum context length 256K tokens using tensor parallel on 4 GPUs.
 ```shell
+python -m sglang.launch_server --model Qwen/Qwen3-Coder-Next --tp-size 2 --tool-call-parser qwen3_coder```
 > [!Note]
 > The default context length is 256K. Consider reducing the context length to a smaller value, e.g., `32768`, if the server fails to start.
 ### vLLM