wenhuach commited on
Commit
dc54a36
·
verified ·
1 Parent(s): af9a2ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -1
README.md CHANGED
@@ -1,11 +1,17 @@
1
  ---
2
  base_model:
3
  - Qwen/Qwen2.5-VL-7B-Instruct
 
4
  ---
 
 
5
  This is an example model demonstrating how to run the AutoRound format for a visual language model on vLLM. Some visual modules have been quantized to 8-bit precision.
6
 
7
- this pr https://github.com/vllm-project/vllm/pull/21802 is required.
8
 
 
 
 
 
9
 
10
  ~~~bash
11
  vllm serve Intel/Qwen2.5-VL-7B-Instruct-int4-mixed-AutoRound --dtype bfloat16 --port 8001 --max-model-len 10000
@@ -33,5 +39,37 @@ curl --noproxy '*' http://localhost:8001/v1/chat/completions -H "Content-Typ
33
  ],
34
  "max_tokens": 512
35
  }'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
 
 
37
  ~~~
 
1
  ---
2
  base_model:
3
  - Qwen/Qwen2.5-VL-7B-Instruct
4
+ license: apache-2.0
5
  ---
6
+
7
+ ## Model Details
8
  This is an example model demonstrating how to run the AutoRound format for a visual language model on vLLM. Some visual modules have been quantized to 8-bit precision.
9
 
 
10
 
11
+ ## Run The Model
12
+
13
+
14
+ this pr https://github.com/vllm-project/vllm/pull/21802 is required.
15
 
16
  ~~~bash
17
  vllm serve Intel/Qwen2.5-VL-7B-Instruct-int4-mixed-AutoRound --dtype bfloat16 --port 8001 --max-model-len 10000
 
39
  ],
40
  "max_tokens": 512
41
  }'
42
+ ~~~
43
+
44
+
45
+
46
+ ## Generate the model
47
+
48
+ ~~~python
49
+ import torch
50
+ from auto_round import AutoRound, AutoRoundMLLM
51
+ from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
52
+
53
+ model_name = "Qwen/Qwen2.5-VL-7B-Instruct/"
54
+
55
+ # default: Load the model on the available device(s)
56
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
57
+ model_name, torch_dtype="auto", device_map="auto"
58
+ )
59
+
60
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
61
+
62
+ processor = AutoProcessor.from_pretrained(model_name,trust_remote_code=True)
63
+ layer_config = {}
64
+ for n, m in model.named_modules():
65
+ if "visual" in n:
66
+ if not isinstance(m, torch.nn.Linear):
67
+ continue
68
+ if "mlp.gate_proj" in n or "mlp.down_proj" in n or "mlp.up_proj" in n:
69
+ layer_config[n] = {"bits": 16}
70
+ else:
71
+ layer_config[n] = {"bits": 8}
72
 
73
+ autoround = AutoRoundMLLM(model, tokenizer, processor=processor, iters=200, group_size=128,layer_config=layer_config)
74
+ autoround.quantize_and_save("./Qwen2.5-VL-7B-Instruct-autoround)
75
  ~~~