tclf90 commited on
Commit
7da9560
·
verified ·
1 Parent(s): 57b42d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -4
README.md CHANGED
@@ -1,14 +1,85 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
3
  language:
4
  - zh
5
  - en
6
  base_model:
7
- - zai-org/GLM-4.5-Air-Base
8
- pipeline_tag: image-text-to-text
9
- library_name: transformers
10
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  # GLM-4.5V
13
 
14
  <div align="center">
@@ -109,4 +180,3 @@ If you use this model, please cite the following paper:
109
  url={https://arxiv.org/abs/2507.01006},
110
  }
111
  ```
112
-
 
1
  ---
2
  license: mit
3
+ library_name: transformers
4
+ pipeline_tag: image-text-to-text
5
+ tags:
6
+ - glm4v_moe
7
+ - AWQ
8
+ - vLLM
9
  language:
10
  - zh
11
  - en
12
  base_model:
13
+ - zai-org/GLM-4.5V
14
+ base_model_relation: quantized
15
+
16
  ---
17
+ # GLM-4.5V-AWQ
18
+ Base model: [zai-org/GLM-4.5V](https://huggingface.co/zai-org/GLM-4.5V)
19
+
20
+ ### 【vLLM Single Node with 4 GPUs — Startup Command】
21
+ <i>❗Required: Use `--enable-expert-parallel` when launching this model.
22
+ Without it, the `expert tensors` cannot be evenly partitioned.</i>
23
+
24
+ <i>❗Required for 8 GPUs: Use `prefill/decode disaggregated serving`
25
+ ([reference](https://docs.vllm.ai/en/latest/examples/online_serving/disaggregated_serving.html)),
26
+ otherwise the `vision attention heads` cannot be evenly partitioned.</i>
27
+ ```
28
+ CONTEXT_LENGTH=32768
29
+
30
+ vllm serve \
31
+ QuantTrio/GLM-4.5V-AWQ \
32
+ --served-model-name GLM-4.5V-AWQ \
33
+ --tool-call-parser glm45 \
34
+ --reasoning-parser glm45 \
35
+ --enable-auto-tool-choice \
36
+ --allowed-local-media-path / \
37
+ --media-io-kwargs '{"video": {"num_frames": -1}}' \
38
+ --enable-expert-parallel \
39
+ --swap-space 16 \
40
+ --max-num-seqs 512 \
41
+ --max-model-len $CONTEXT_LENGTH \
42
+ --max-seq-len-to-capture $CONTEXT_LENGTH \
43
+ --gpu-memory-utilization 0.9 \
44
+ --tensor-parallel-size 4 \
45
+ --trust-remote-code \
46
+ --disable-log-requests \
47
+ --host 0.0.0.0 \
48
+ --port 8000
49
+ ```
50
+
51
+ ### 【Dependencies / Installation】
52
+
53
+ As of **2025-08-12**, create a fresh Python environment and run:
54
 
55
+ ```bash
56
+ # Patched vLLM (see: https://github.com/vllm-project/vllm/pull/22716)
57
+ git clone -b glm-45 https://github.com/zRzRzRzRzRzRzR/vllm.git
58
+ cd vllm
59
+ VLLM_USE_PRECOMPILED=1 pip install .
60
+
61
+ # Install preview build of Transformers with GLM-4.5V support
62
+ pip install transformers-v4.55.0-GLM-4.5V-preview
63
+ ```
64
+
65
+ ### 【Logs】
66
+ ```
67
+ 2025-08-12
68
+ 1. Initial commit
69
+ ```
70
+
71
+ ### 【Model Files】
72
+ | File Size | Last Updated |
73
+ |-----------|--------------|
74
+ | `57GB` | `2025-08-12` |
75
+
76
+ ### 【Model Download】
77
+ ```python
78
+ from huggingface_hub import snapshot_download
79
+ snapshot_download('QuantTrio/GLM-4.5V-AWQ', cache_dir="your_local_path")
80
+ ```
81
+
82
+ ### 【Overview】
83
  # GLM-4.5V
84
 
85
  <div align="center">
 
180
  url={https://arxiv.org/abs/2507.01006},
181
  }
182
  ```