wenhuach commited on
Commit
d513d71
·
verified ·
1 Parent(s): af149f3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This is an example model to show how to run autoround format for visual langugae model on vLLM. Some visual modules have been quantized with 8 bits
2
+
3
+ this pr https://github.com/vllm-project/vllm/pull/21802 is required.
4
+
5
+
6
+ ~~~bash
7
+ vllm serve Intel/Qwen2.5-VL-7B-Instruct-int4-mixed-AutoRound --port 8001 --dtype bfloat16 --max-model-len 10000
8
+ ~~~
9
+
10
+ ~~~bash
11
+ curl --noproxy '*' http://localhost:8001/v1/chat/completions -H "Content-Type: application/json" -d '{
12
+ "model": "Intel/Qwen2.5-VL-7B-Instruct-int4-mixed-AutoRound",
13
+ "messages": [
14
+ {
15
+ "role": "user",
16
+ "content": [
17
+ {
18
+ "type": "image_url",
19
+ "image_url": {
20
+ "url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
21
+ }
22
+ },
23
+ {
24
+ "type": "text",
25
+ "text": "请描述这张图"
26
+ }
27
+ ]
28
+ }
29
+ ],
30
+ "max_tokens": 512
31
+ }'
32
+
33
+ ~~~