danielhanchen commited on
Commit
4577173
·
verified ·
1 Parent(s): 7622487

Add files using upload-large-folder tool

Browse files
README.md CHANGED
@@ -1,59 +1,20 @@
1
  ---
2
- base_model: Qwen/Qwen2.5-VL-7B-Instruct
 
 
3
  language:
4
  - en
5
- library_name: transformers
6
  pipeline_tag: image-text-to-text
7
- license: apache-2.0
8
  tags:
9
  - multimodal
10
- - qwen
11
- - qwen2
12
  - unsloth
13
- - transformers
14
- - vision
15
  ---
16
- <div>
17
- <p style="margin-bottom: 0;margin-top:0;">
18
- <em>Unsloth's <a href="https://unsloth.ai/blog/dynamic-4bit">Dynamic 4-bit Quants</a> is selectively quantized, greatly improving accuracy over standard 4-bit.</em>
19
- </p>
20
- <div style="display: flex; gap: 5px; align-items: center;margin-top:0; ">
21
- <a href="https://github.com/unslothai/unsloth/">
22
- <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
23
- </a>
24
- <a href="https://discord.gg/unsloth">
25
- <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
26
- </a>
27
- <a href="https://docs.unsloth.ai/">
28
- <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
29
- </a>
30
- </div>
31
- <h1 style="margin-top: 0rem;">Finetune LLMs 2-5x faster with 70% less memory via Unsloth</h2>
32
- </div>
33
- We have a free Google Colab Tesla T4 notebook for Qwen2-VL (7B) here: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb
34
-
35
- ## ✨ Finetune for Free
36
-
37
- All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.
38
-
39
- | Unsloth supports | Free Notebooks | Performance | Memory use |
40
- |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
41
- | **Llama-3.2 (3B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) | 2.4x faster | 58% less |
42
- | **Llama-3.2 (11B vision)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) | 2x faster | 60% less |
43
- | **Qwen2 VL (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb) | 1.8x faster | 60% less |
44
- | **Qwen2.5 (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb) | 2x faster | 60% less |
45
- | **Llama-3.1 (8B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb) | 2.4x faster | 58% less |
46
- | **Phi-3.5 (mini)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_3.5_Mini-Conversational.ipynb) | 2x faster | 50% less |
47
- | **Gemma 2 (9B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma2_(9B)-Alpaca.ipynb) | 2.4x faster | 58% less |
48
- | **Mistral (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb) | 2.2x faster | 62% less |
49
-
50
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="200"/>](https://docs.unsloth.ai)
51
-
52
- - This [Llama 3.2 conversational notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates.
53
- - This [text completion notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_(7B)-Text_Completion.ipynb) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr.
54
- - \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.
55
-
56
- # Qwen2.5-VL
57
 
58
  ## Introduction
59
 
@@ -567,4 +528,3 @@ If you find our work helpful, feel free to give us a cite.
567
  year={2023}
568
  }
569
  ```
570
-
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-VL-7B-Instruct
4
+ license: apache-2.0
5
  language:
6
  - en
 
7
  pipeline_tag: image-text-to-text
 
8
  tags:
9
  - multimodal
 
 
10
  - unsloth
11
+ library_name: transformers
 
12
  ---
13
+
14
+ # Qwen2.5-VL-7B-Instruct
15
+ <a href="https://chat.qwenlm.ai/" target="_blank" style="margin: 2px;">
16
+ <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
17
+ </a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ## Introduction
20
 
 
528
  year={2023}
529
  }
530
  ```
 
chat_template.jinja ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
2
+ You are a helpful assistant.<|im_end|>
3
+ {% endif %}<|im_start|>{{ message['role'] }}
4
+ {% if message['content'] is string %}{{ message['content'] }}<|im_end|>
5
+ {% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
6
+ {% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
7
+ {% endif %}
config.json CHANGED
@@ -1,5 +1,4 @@
1
  {
2
- "_name_or_path": "unsloth/Qwen2.5-VL-7B-Instruct",
3
  "architectures": [
4
  "Qwen2_5_VLForConditionalGeneration"
5
  ],
@@ -10,7 +9,7 @@
10
  "image_token_id": 151655,
11
  "initializer_range": 0.02,
12
  "intermediate_size": 18944,
13
- "max_position_embeddings": 32768,
14
  "max_window_layers": 28,
15
  "model_type": "qwen2_5_vl",
16
  "num_attention_heads": 28,
@@ -31,66 +30,68 @@
31
  "multi_modal_projector",
32
  "merger",
33
  "modality_projection",
34
- "visual.blocks.24.attn",
35
- "visual.blocks.28.attn",
36
- "visual.blocks.31.mlp",
37
  "visual.blocks.27.attn",
 
 
 
38
  "visual.blocks.21.attn",
39
  "visual.blocks.29.mlp",
40
- "visual.blocks.30.attn",
41
- "visual.blocks.20.attn",
42
  "visual.blocks.31.attn",
 
43
  "visual.blocks.28.mlp",
 
44
  "visual.blocks.25.mlp",
45
- "visual.blocks.25.attn",
46
- "visual.blocks.27.mlp",
47
  "visual.blocks.24.mlp",
48
  "visual.blocks.17.attn",
49
- "visual.blocks.19.attn",
50
  "visual.blocks.23.attn",
51
- "visual.blocks.26.mlp",
52
- "visual.blocks.9.attn",
53
- "visual.blocks.16.attn",
54
  "visual.blocks.23.mlp",
55
- "visual.blocks.11.attn",
56
- "visual.blocks.20.mlp",
57
  "visual.blocks.21.mlp",
58
- "visual.blocks.10.attn",
59
  "visual.blocks.18.attn",
 
 
 
 
 
 
60
  "visual.blocks.22.mlp",
61
- "visual.blocks.6.attn",
62
- "visual.blocks.13.attn",
63
  "visual.blocks.18.mlp",
 
64
  "visual.blocks.12.attn",
 
65
  "visual.blocks.10.mlp",
66
- "visual.blocks.8.attn",
67
- "visual.blocks.11.mlp",
68
  "visual.blocks.8.mlp",
69
- "visual.blocks.19.mlp",
 
 
 
70
  "visual.blocks.7.mlp",
 
 
71
  "visual.blocks.5.mlp",
72
- "visual.blocks.16.mlp",
73
- "visual.blocks.13.mlp",
74
- "visual.blocks.4.mlp",
75
- "visual.blocks.14.attn",
76
- "visual.blocks.9.mlp",
77
  "visual.blocks.12.mlp",
 
78
  "visual.blocks.14.mlp",
79
  "visual.blocks.2.mlp",
80
- "visual.blocks.15.mlp",
81
- "visual.blocks.6.mlp",
82
  "visual.blocks.5.attn",
83
- "visual.blocks.3.mlp",
84
- "visual.blocks.15.attn",
85
  "visual.blocks.4.attn",
 
 
86
  "visual.blocks.1.mlp",
87
- "visual.blocks.2.attn",
88
- "visual.blocks.7.attn",
89
- "visual.blocks.1.attn",
90
  "visual.blocks.17.mlp",
91
- "visual.blocks.3.attn",
92
  "visual.blocks.0.attn",
93
- "visual.blocks.0.mlp"
 
 
94
  ],
95
  "llm_int8_threshold": 6.0,
96
  "load_in_4bit": true,
@@ -109,20 +110,76 @@
109
  },
110
  "rope_theta": 1000000.0,
111
  "sliding_window": 32768,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  "tie_word_embeddings": false,
113
  "torch_dtype": "bfloat16",
114
- "transformers_version": "4.49.0",
115
  "unsloth_fixed": true,
116
  "use_cache": true,
117
  "use_sliding_window": false,
118
  "video_token_id": 151656,
119
  "vision_config": {
 
 
 
 
 
 
 
 
120
  "hidden_size": 1280,
 
121
  "in_chans": 3,
 
 
122
  "model_type": "qwen2_5_vl",
 
 
 
 
123
  "spatial_patch_size": 14,
 
124
  "tokens_per_second": 2,
125
- "torch_dtype": "bfloat16"
 
126
  },
127
  "vision_end_token_id": 151653,
128
  "vision_start_token_id": 151652,
 
1
  {
 
2
  "architectures": [
3
  "Qwen2_5_VLForConditionalGeneration"
4
  ],
 
9
  "image_token_id": 151655,
10
  "initializer_range": 0.02,
11
  "intermediate_size": 18944,
12
+ "max_position_embeddings": 128000,
13
  "max_window_layers": 28,
14
  "model_type": "qwen2_5_vl",
15
  "num_attention_heads": 28,
 
30
  "multi_modal_projector",
31
  "merger",
32
  "modality_projection",
 
 
 
33
  "visual.blocks.27.attn",
34
+ "visual.blocks.28.attn",
35
+ "visual.blocks.25.attn",
36
+ "visual.blocks.22.attn",
37
  "visual.blocks.21.attn",
38
  "visual.blocks.29.mlp",
39
+ "visual.blocks.24.attn",
40
+ "visual.blocks.29.attn",
41
  "visual.blocks.31.attn",
42
+ "visual.blocks.30.attn",
43
  "visual.blocks.28.mlp",
44
+ "visual.blocks.20.attn",
45
  "visual.blocks.25.mlp",
46
+ "visual.blocks.19.attn",
47
+ "visual.blocks.26.mlp",
48
  "visual.blocks.24.mlp",
49
  "visual.blocks.17.attn",
50
+ "visual.blocks.27.mlp",
51
  "visual.blocks.23.attn",
 
 
 
52
  "visual.blocks.23.mlp",
 
 
53
  "visual.blocks.21.mlp",
54
+ "visual.blocks.19.mlp",
55
  "visual.blocks.18.attn",
56
+ "visual.blocks.20.mlp",
57
+ "visual.blocks.11.attn",
58
+ "visual.blocks.9.mlp",
59
+ "visual.blocks.9.attn",
60
+ "visual.blocks.16.attn",
61
+ "visual.blocks.11.mlp",
62
  "visual.blocks.22.mlp",
 
 
63
  "visual.blocks.18.mlp",
64
+ "visual.blocks.13.attn",
65
  "visual.blocks.12.attn",
66
+ "visual.blocks.6.attn",
67
  "visual.blocks.10.mlp",
 
 
68
  "visual.blocks.8.mlp",
69
+ "visual.blocks.8.attn",
70
+ "visual.blocks.14.attn",
71
+ "visual.blocks.4.mlp",
72
+ "visual.blocks.16.mlp",
73
  "visual.blocks.7.mlp",
74
+ "visual.blocks.6.mlp",
75
+ "visual.blocks.15.mlp",
76
  "visual.blocks.5.mlp",
77
+ "visual.blocks.10.attn",
78
+ "visual.blocks.3.mlp",
 
 
 
79
  "visual.blocks.12.mlp",
80
+ "visual.blocks.13.mlp",
81
  "visual.blocks.14.mlp",
82
  "visual.blocks.2.mlp",
 
 
83
  "visual.blocks.5.attn",
84
+ "visual.blocks.1.attn",
85
+ "visual.blocks.2.attn",
86
  "visual.blocks.4.attn",
87
+ "visual.blocks.3.attn",
88
+ "visual.blocks.15.attn",
89
  "visual.blocks.1.mlp",
 
 
 
90
  "visual.blocks.17.mlp",
 
91
  "visual.blocks.0.attn",
92
+ "visual.blocks.7.attn",
93
+ "visual.blocks.0.mlp",
94
+ "visual.blocks.31.mlp.down_proj"
95
  ],
96
  "llm_int8_threshold": 6.0,
97
  "load_in_4bit": true,
 
110
  },
111
  "rope_theta": 1000000.0,
112
  "sliding_window": 32768,
113
+ "text_config": {
114
+ "architectures": [
115
+ "Qwen2_5_VLForConditionalGeneration"
116
+ ],
117
+ "attention_dropout": 0.0,
118
+ "bos_token_id": 151643,
119
+ "eos_token_id": 151645,
120
+ "hidden_act": "silu",
121
+ "hidden_size": 3584,
122
+ "image_token_id": null,
123
+ "initializer_range": 0.02,
124
+ "intermediate_size": 18944,
125
+ "max_position_embeddings": 128000,
126
+ "max_window_layers": 28,
127
+ "model_type": "qwen2_5_vl_text",
128
+ "num_attention_heads": 28,
129
+ "num_hidden_layers": 28,
130
+ "num_key_value_heads": 4,
131
+ "rms_norm_eps": 1e-06,
132
+ "rope_scaling": {
133
+ "mrope_section": [
134
+ 16,
135
+ 24,
136
+ 24
137
+ ],
138
+ "rope_type": "default",
139
+ "type": "default"
140
+ },
141
+ "rope_theta": 1000000.0,
142
+ "sliding_window": 32768,
143
+ "torch_dtype": "bfloat16",
144
+ "use_cache": true,
145
+ "use_sliding_window": false,
146
+ "video_token_id": null,
147
+ "vision_end_token_id": 151653,
148
+ "vision_start_token_id": 151652,
149
+ "vision_token_id": 151654,
150
+ "vocab_size": 152064
151
+ },
152
  "tie_word_embeddings": false,
153
  "torch_dtype": "bfloat16",
154
+ "transformers_version": "4.52.0.dev0",
155
  "unsloth_fixed": true,
156
  "use_cache": true,
157
  "use_sliding_window": false,
158
  "video_token_id": 151656,
159
  "vision_config": {
160
+ "depth": 32,
161
+ "fullatt_block_indexes": [
162
+ 7,
163
+ 15,
164
+ 23,
165
+ 31
166
+ ],
167
+ "hidden_act": "silu",
168
  "hidden_size": 1280,
169
+ "in_channels": 3,
170
  "in_chans": 3,
171
+ "initializer_range": 0.02,
172
+ "intermediate_size": 3420,
173
  "model_type": "qwen2_5_vl",
174
+ "num_heads": 16,
175
+ "out_hidden_size": 3584,
176
+ "patch_size": 14,
177
+ "spatial_merge_size": 2,
178
  "spatial_patch_size": 14,
179
+ "temporal_patch_size": 2,
180
  "tokens_per_second": 2,
181
+ "torch_dtype": "bfloat16",
182
+ "window_size": 112
183
  },
184
  "vision_end_token_id": 151653,
185
  "vision_start_token_id": 151652,
generation_config.json CHANGED
@@ -5,11 +5,9 @@
5
  151645,
6
  151643
7
  ],
8
- "max_length": 32768,
9
  "pad_token_id": 151654,
10
  "repetition_penalty": 1.05,
11
- "temperature": 0.1,
12
- "top_k": 1,
13
- "top_p": 0.001,
14
- "transformers_version": "4.49.0"
15
  }
 
5
  151645,
6
  151643
7
  ],
8
+ "max_length": 128000,
9
  "pad_token_id": 151654,
10
  "repetition_penalty": 1.05,
11
+ "temperature": 1e-06,
12
+ "transformers_version": "4.52.0.dev0"
 
 
13
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:451d5705f2b6232b68e745223fa5a5edb86ba0eb678a003e0fd29936ea6138a8
3
- size 6851743942
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f2041306642ad99ee54b96a372a2603010ba678c86595a721c3100a4febf504
3
+ size 6858199244
tokenizer_config.json CHANGED
@@ -195,16 +195,16 @@
195
  "<|video_pad|>"
196
  ],
197
  "bos_token": null,
198
- "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
  "clean_up_tokenization_spaces": false,
200
  "eos_token": "<|im_end|>",
201
  "errors": "replace",
202
  "extra_special_tokens": {},
203
- "model_max_length": 32768,
204
  "pad_token": "<|vision_pad|>",
205
  "padding_side": "left",
206
  "processor_class": "Qwen2_5_VLProcessor",
207
  "split_special_tokens": false,
208
  "tokenizer_class": "Qwen2Tokenizer",
209
- "unk_token": null
210
- }
 
 
195
  "<|video_pad|>"
196
  ],
197
  "bos_token": null,
 
198
  "clean_up_tokenization_spaces": false,
199
  "eos_token": "<|im_end|>",
200
  "errors": "replace",
201
  "extra_special_tokens": {},
202
+ "model_max_length": 128000,
203
  "pad_token": "<|vision_pad|>",
204
  "padding_side": "left",
205
  "processor_class": "Qwen2_5_VLProcessor",
206
  "split_special_tokens": false,
207
  "tokenizer_class": "Qwen2Tokenizer",
208
+ "unk_token": null,
209
+ "chat_template": "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% endif %}<|im_start|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<|im_end|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}"
210
+ }