littlebird13 commited on
Commit
116ac81
·
verified ·
1 Parent(s): c151f17

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -80,16 +80,18 @@ print("thinking content:", thinking_content)
80
  print("content:", content)
81
  ```
82
 
83
- For deployment, you can use `vllm>=0.8.5` or `sglang>=0.4.5.post2` to create an OpenAI-compatible API endpoint:
84
- - vLLM:
85
  ```shell
86
- vllm serve Qwen/Qwen3-14B-FP8 --enable-reasoning --reasoning-parser deepseek_r1
87
  ```
88
- - SGLang:
89
  ```shell
90
- python -m sglang.launch_server --model-path Qwen/Qwen3-14B-FP8 --reasoning-parser deepseek-r1
91
  ```
92
 
 
 
93
  ## Note on FP8
94
 
95
  For convenience and performance, we have provided `fp8`-quantized model checkpoint for Qwen3, whose name ends with `-FP8`. The quantization method is fine-grained `fp8` quantization with block size of 128. You can find more details in the `quantization_config` field in `config.json`.
@@ -125,8 +127,8 @@ However, please pay attention to the following known issues:
125
  ## Switching Between Thinking and Non-Thinking Mode
126
 
127
  > [!TIP]
128
- > The `enable_thinking` switch is also available in APIs created by vLLM and SGLang.
129
- > Please refer to our documentation for [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) and [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) users.
130
 
131
  ### `enable_thinking=True`
132
 
 
80
  print("content:", content)
81
  ```
82
 
83
+ For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.4` or to create an OpenAI-compatible API endpoint:
84
+ - SGLang:
85
  ```shell
86
+ python -m sglang.launch_server --model-path Qwen/Qwen3-14B-FP8 --reasoning-parser qwen3
87
  ```
88
+ - vLLM:
89
  ```shell
90
+ vllm serve Qwen/Qwen3-14B-FP8 --enable-reasoning --reasoning-parser deepseek_r1
91
  ```
92
 
93
+ For local use, applications such as llama.cpp, Ollama, LMStudio, and MLX-LM have also supported Qwen3.
94
+
95
  ## Note on FP8
96
 
97
  For convenience and performance, we have provided `fp8`-quantized model checkpoint for Qwen3, whose name ends with `-FP8`. The quantization method is fine-grained `fp8` quantization with block size of 128. You can find more details in the `quantization_config` field in `config.json`.
 
127
  ## Switching Between Thinking and Non-Thinking Mode
128
 
129
  > [!TIP]
130
+ > The `enable_thinking` switch is also available in APIs created by SGLang and vLLM.
131
+ > Please refer to our documentation for [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) and [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) users.
132
 
133
  ### `enable_thinking=True`
134