feihu.hf commited on
Commit
a68211a
·
1 Parent(s): cba1e86

update README

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -206,7 +206,14 @@ For full technical details, see the [Qwen2.5-1M Technical Report](https://arxiv.
206
 
207
  #### Step 1: Update Configuration File
208
 
209
- Replace the content of your `config.json` with `config_1m.json`, which includes the config for length extrapolation and sparse attention.
 
 
 
 
 
 
 
210
 
211
  #### Step 2: Launch Model Server
212
 
@@ -226,7 +233,7 @@ Then launch the server with Dual Chunk Flash Attention enabled:
226
 
227
  ```bash
228
  VLLM_ATTENTION_BACKEND=DUAL_CHUNK_FLASH_ATTN VLLM_USE_V1=0 \
229
- vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507 \
230
  --tensor-parallel-size 8 \
231
  --max-model-len 1010000 \
232
  --enable-chunked-prefill \
@@ -262,7 +269,7 @@ Launch the server with DCA support:
262
 
263
  ```bash
264
  python3 -m sglang.launch_server \
265
- --model-path Qwen/Qwen3-235B-A22B-Instruct-2507 \
266
  --context-length 1010000 \
267
  --mem-frac 0.75 \
268
  --attention-backend dual_chunk_flash_attn \
 
206
 
207
  #### Step 1: Update Configuration File
208
 
209
+ Download the model and replace the content of your `config.json` with `config_1m.json`, which includes the config for length extrapolation and sparse attention.
210
+
211
+ ```bash
212
+ export MODELNAME=Qwen3-235B-A22B-Instruct-2507
213
+ huggingface-cli download Qwen/${MODELNAME} --local-dir ${MODELNAME}
214
+ mv ${MODELNAME}/config.json ${MODELNAME}/config.json.bak
215
+ mv ${MODELNAME}/config_1m.json ${MODELNAME}/config.json
216
+ ```
217
 
218
  #### Step 2: Launch Model Server
219
 
 
233
 
234
  ```bash
235
  VLLM_ATTENTION_BACKEND=DUAL_CHUNK_FLASH_ATTN VLLM_USE_V1=0 \
236
+ vllm serve ./Qwen3-235B-A22B-Instruct-2507 \
237
  --tensor-parallel-size 8 \
238
  --max-model-len 1010000 \
239
  --enable-chunked-prefill \
 
269
 
270
  ```bash
271
  python3 -m sglang.launch_server \
272
+ --model-path ./Qwen3-235B-A22B-Instruct-2507 \
273
  --context-length 1010000 \
274
  --mem-frac 0.75 \
275
  --attention-backend dual_chunk_flash_attn \