QuantTrio
/

GLM-4.5-Air-GPTQ-Int4-Int8Mix

Text Generation

quantization fix

4-bit precision

Model card Files Files and versions Community

JunHowie commited on 14 days ago

Commit

9414256

·

verified ·

1 Parent(s): 76f6dd3

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -18,9 +18,8 @@ base_model_relation: quantized
   ### 【vLLM Single Node with 8 GPUs Startup Command】
   <i>Note: You must use `--enable-expert-parallel` to start this model, otherwise the expert tensor TP will not divide evenly. This is required even for 2 GPUs.</i>
-  ```
-  CONTEXT_LENGTH=32768
 CONTEXT_LENGTH=32768
 VLLM_USE_MODELSCOPE=true vllm serve \

   ### 【vLLM Single Node with 8 GPUs Startup Command】
   <i>Note: You must use `--enable-expert-parallel` to start this model, otherwise the expert tensor TP will not divide evenly. This is required even for 2 GPUs.</i>
+```
 CONTEXT_LENGTH=32768
 VLLM_USE_MODELSCOPE=true vllm serve \