Add link to paper in model card
Browse filesThis PR links the model card to the paper, ensuring users can easily find it at [Qwen2.5-VL Technical Report](https://huggingface.co/papers/2502.13923).
README.md
CHANGED
@@ -1,16 +1,15 @@
|
|
1 |
-
|
2 |
---
|
|
|
|
|
|
|
|
|
|
|
3 |
license: other
|
4 |
license_name: qwen
|
5 |
license_link: https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct-AWQ/blob/main/LICENSE
|
6 |
-
language:
|
7 |
-
- en
|
8 |
pipeline_tag: image-text-to-text
|
9 |
tags:
|
10 |
- multimodal
|
11 |
-
library_name: transformers
|
12 |
-
base_model:
|
13 |
-
- Qwen/Qwen2.5-VL-72B-Instruct
|
14 |
---
|
15 |
|
16 |
# Qwen2.5-VL-72B-Instruct-AWQ
|
@@ -18,6 +17,8 @@ base_model:
|
|
18 |
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
|
19 |
</a>
|
20 |
|
|
|
|
|
21 |
## Introduction
|
22 |
|
23 |
In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL.
|
@@ -84,7 +85,7 @@ KeyError: 'qwen2_5_vl'
|
|
84 |
We offer a toolkit to help you handle various types of visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved images and videos. You can install it using the following command:
|
85 |
|
86 |
```bash
|
87 |
-
# It's highly
|
88 |
pip install qwen-vl-utils[decord]==0.0.8
|
89 |
```
|
90 |
|
@@ -95,7 +96,7 @@ If you are not using Linux, you might not be able to install `decord` from PyPI.
|
|
95 |
Here we show a code snippet to show you how to use the chat model with `transformers` and `qwen_vl_utils`:
|
96 |
|
97 |
```python
|
98 |
-
from transformers import Qwen2_5_VLForConditionalGeneration,
|
99 |
from qwen_vl_utils import process_vision_info
|
100 |
|
101 |
# default: Load the model on the available device(s)
|
@@ -210,7 +211,7 @@ The model supports a wide range of resolution inputs. By default, it uses the na
|
|
210 |
min_pixels = 256 * 28 * 28
|
211 |
max_pixels = 1280 * 28 * 28
|
212 |
processor = AutoProcessor.from_pretrained(
|
213 |
-
"Qwen/Qwen2.5-VL-
|
214 |
)
|
215 |
```
|
216 |
|
|
|
|
|
1 |
---
|
2 |
+
base_model:
|
3 |
+
- Qwen/Qwen2.5-VL-72B-Instruct
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
library_name: transformers
|
7 |
license: other
|
8 |
license_name: qwen
|
9 |
license_link: https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct-AWQ/blob/main/LICENSE
|
|
|
|
|
10 |
pipeline_tag: image-text-to-text
|
11 |
tags:
|
12 |
- multimodal
|
|
|
|
|
|
|
13 |
---
|
14 |
|
15 |
# Qwen2.5-VL-72B-Instruct-AWQ
|
|
|
17 |
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
|
18 |
</a>
|
19 |
|
20 |
+
This repository contains the model described in the paper [Qwen2.5-VL Technical Report](https://huggingface.co/papers/2502.13923).
|
21 |
+
|
22 |
## Introduction
|
23 |
|
24 |
In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL.
|
|
|
85 |
We offer a toolkit to help you handle various types of visual input more conveniently, as if you were using an API. This includes base64, URLs, and interleaved images and videos. You can install it using the following command:
|
86 |
|
87 |
```bash
|
88 |
+
# It's highly recommended to use `[decord]` feature for faster video loading.
|
89 |
pip install qwen-vl-utils[decord]==0.0.8
|
90 |
```
|
91 |
|
|
|
96 |
Here we show a code snippet to show you how to use the chat model with `transformers` and `qwen_vl_utils`:
|
97 |
|
98 |
```python
|
99 |
+
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
|
100 |
from qwen_vl_utils import process_vision_info
|
101 |
|
102 |
# default: Load the model on the available device(s)
|
|
|
211 |
min_pixels = 256 * 28 * 28
|
212 |
max_pixels = 1280 * 28 * 28
|
213 |
processor = AutoProcessor.from_pretrained(
|
214 |
+
"Qwen/Qwen2.5-VL-7B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels
|
215 |
)
|
216 |
```
|
217 |
|