Intel
/

Qwen2.5-VL-7B-Instruct-int4-mixed-AutoRound

4-bit precision

Model card Files Files and versions

wenhuach commited on 8 days ago

Commit

d513d71

·

verified ·

1 Parent(s): af149f3

Create README.md

Files changed (1) hide show

README.md +33 -0

README.md ADDED Viewed

	@@ -0,0 +1,33 @@

+This is an example model  to show how to run autoround format for visual langugae model on vLLM. Some visual modules have been quantized with 8 bits
+this pr https://github.com/vllm-project/vllm/pull/21802 is required.
+ ~~~bash
+ vllm serve Intel/Qwen2.5-VL-7B-Instruct-int4-mixed-AutoRound  --port 8001 --dtype bfloat16   --max-model-len 10000
+~~~
+~~~bash
+curl --noproxy '*'   http://localhost:8001/v1/chat/completions   -H "Content-Type: application/json"   -d '{
+    "model": "Intel/Qwen2.5-VL-7B-Instruct-int4-mixed-AutoRound",
+    "messages": [
+      {
+        "role": "user",
+        "content": [
+          {
+            "type": "image_url",
+            "image_url": {
+              "url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
+            }
+          },
+          {
+            "type": "text",
+            "text": "请描述这张图"
+          }
+        ]
+      }
+    ],
+    "max_tokens": 512
+  }'
+~~~