|
--- |
|
base_model: |
|
- Qwen/Qwen2.5-VL-7B-Instruct |
|
license: apache-2.0 |
|
--- |
|
|
|
## Model Details |
|
This is an example model demonstrating how to run the AutoRound format for a visual language model on vLLM. Some visual modules have been quantized to 8-bit precision. |
|
|
|
|
|
## Run The Model |
|
|
|
|
|
this pr https://github.com/vllm-project/vllm/pull/21802 is required. |
|
|
|
~~~bash |
|
vllm serve Intel/Qwen2.5-VL-7B-Instruct-int4-mixed-AutoRound --dtype bfloat16 --port 8001 --max-model-len 10000 |
|
~~~ |
|
|
|
~~~bash |
|
curl --noproxy '*' http://localhost:8001/v1/chat/completions -H "Content-Type: application/json" -d '{ |
|
"model": "Intel/Qwen2.5-VL-7B-Instruct-int4-mixed-AutoRound", |
|
"messages": [ |
|
{ |
|
"role": "user", |
|
"content": [ |
|
{ |
|
"type": "image_url", |
|
"image_url": { |
|
"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg" |
|
} |
|
}, |
|
{ |
|
"type": "text", |
|
"text": "请描述这张图" |
|
} |
|
] |
|
} |
|
], |
|
"max_tokens": 512 |
|
}' |
|
~~~ |
|
|
|
|
|
|
|
## Generate the model |
|
|
|
~~~python |
|
import torch |
|
from auto_round import AutoRound, AutoRoundMLLM |
|
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor |
|
|
|
model_name = "Qwen/Qwen2.5-VL-7B-Instruct/" |
|
|
|
# default: Load the model on the available device(s) |
|
model = Qwen2_5_VLForConditionalGeneration.from_pretrained( |
|
model_name, torch_dtype="auto", device_map="auto" |
|
) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
processor = AutoProcessor.from_pretrained(model_name,trust_remote_code=True) |
|
layer_config = {} |
|
for n, m in model.named_modules(): |
|
if "visual" in n: |
|
if not isinstance(m, torch.nn.Linear): |
|
continue |
|
if "mlp.gate_proj" in n or "mlp.down_proj" in n or "mlp.up_proj" in n: |
|
layer_config[n] = {"bits": 16} |
|
else: |
|
layer_config[n] = {"bits": 8} |
|
|
|
autoround = AutoRoundMLLM(model, tokenizer, processor=processor, iters=200, group_size=128,layer_config=layer_config) |
|
autoround.quantize_and_save("./Qwen2.5-VL-7B-Instruct-autoround) |
|
~~~ |