Add files using upload-large-folder tool

Browse files

Files changed (6) hide show

README.md +9 -49
chat_template.jinja +7 -0
config.json +94 -37
generation_config.json +3 -5
model.safetensors +2 -2
tokenizer_config.json +4 -4

README.md CHANGED Viewed

@@ -1,59 +1,20 @@
 ---
-base_model: Qwen/Qwen2.5-VL-7B-Instruct
 language:
 - en
-library_name: transformers
 pipeline_tag: image-text-to-text
-license: apache-2.0
 tags:
 - multimodal
-- qwen
-- qwen2
 - unsloth
-- transformers
-- vision
 ---
-<div>
-  <p style="margin-bottom: 0;margin-top:0;">
-    <em>Unsloth's <a href="https://unsloth.ai/blog/dynamic-4bit">Dynamic 4-bit Quants</a> is selectively quantized, greatly improving accuracy over standard 4-bit.</em>
-  </p>
-  <div style="display: flex; gap: 5px; align-items: center;margin-top:0; ">
-    <a href="https://github.com/unslothai/unsloth/">
-      <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
-    </a>
-    <a href="https://discord.gg/unsloth">
-      <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
-    </a>
-    <a href="https://docs.unsloth.ai/">
-      <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
-    </a>
-  </div>
-<h1 style="margin-top: 0rem;">Finetune LLMs 2-5x faster with 70% less memory via Unsloth</h2>
-</div>
-We have a free Google Colab Tesla T4 notebook for Qwen2-VL (7B) here: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb
-## ✨ Finetune for Free
-All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.
-| Unsloth supports          |    Free Notebooks                                                                                           | Performance | Memory use |
-|-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
-| **Llama-3.2 (3B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb)               | 2.4x faster | 58% less |
-| **Llama-3.2 (11B vision)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)               | 2x faster | 60% less |
-| **Qwen2 VL (7B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb)               | 1.8x faster | 60% less |
-| **Qwen2.5 (7B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb)               | 2x faster | 60% less |
-| **Llama-3.1 (8B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb)               | 2.4x faster | 58% less |
-| **Phi-3.5 (mini)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_3.5_Mini-Conversational.ipynb)               | 2x faster | 50% less |
-| **Gemma 2 (9B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma2_(9B)-Alpaca.ipynb)               | 2.4x faster | 58% less |
-| **Mistral (7B)**    | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb)               | 2.2x faster | 62% less |
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="200"/>](https://docs.unsloth.ai)
-- This [Llama 3.2 conversational notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates.
-- This [text completion notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_(7B)-Text_Completion.ipynb) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr.
-- \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.
-# Qwen2.5-VL
 ## Introduction
@@ -567,4 +528,3 @@ If you find our work helpful, feel free to give us a cite.
   year={2023}
 }
 ```

 ---
+base_model:
+- Qwen/Qwen2.5-VL-7B-Instruct
+license: apache-2.0
 language:
 - en
 pipeline_tag: image-text-to-text
 tags:
 - multimodal
 - unsloth
+library_name: transformers
 ---
+# Qwen2.5-VL-7B-Instruct
+<a href="https://chat.qwenlm.ai/" target="_blank" style="margin: 2px;">
+    <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
+</a>
 ## Introduction
   year={2023}
 }
 ```

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,7 @@

+{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
+You are a helpful assistant.<|im_end|>
+{% endif %}<|im_start|>{{ message['role'] }}
+{% if message['content'] is string %}{{ message['content'] }}<|im_end|>
+{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
+{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
+{% endif %}

config.json CHANGED Viewed

@@ -1,5 +1,4 @@
 {
-  "_name_or_path": "unsloth/Qwen2.5-VL-7B-Instruct",
   "architectures": [
     "Qwen2_5_VLForConditionalGeneration"
   ],
@@ -10,7 +9,7 @@
   "image_token_id": 151655,
   "initializer_range": 0.02,
   "intermediate_size": 18944,
-  "max_position_embeddings": 32768,
   "max_window_layers": 28,
   "model_type": "qwen2_5_vl",
   "num_attention_heads": 28,
@@ -31,66 +30,68 @@
       "multi_modal_projector",
       "merger",
       "modality_projection",
-      "visual.blocks.24.attn",
-      "visual.blocks.28.attn",
-      "visual.blocks.31.mlp",
       "visual.blocks.27.attn",
       "visual.blocks.21.attn",
       "visual.blocks.29.mlp",
-      "visual.blocks.30.attn",
-      "visual.blocks.20.attn",
       "visual.blocks.31.attn",
       "visual.blocks.28.mlp",
       "visual.blocks.25.mlp",
-      "visual.blocks.25.attn",
-      "visual.blocks.27.mlp",
       "visual.blocks.24.mlp",
       "visual.blocks.17.attn",
-      "visual.blocks.19.attn",
       "visual.blocks.23.attn",
-      "visual.blocks.26.mlp",
-      "visual.blocks.9.attn",
-      "visual.blocks.16.attn",
       "visual.blocks.23.mlp",
-      "visual.blocks.11.attn",
-      "visual.blocks.20.mlp",
       "visual.blocks.21.mlp",
-      "visual.blocks.10.attn",
       "visual.blocks.18.attn",
       "visual.blocks.22.mlp",
-      "visual.blocks.6.attn",
-      "visual.blocks.13.attn",
       "visual.blocks.18.mlp",
       "visual.blocks.12.attn",
       "visual.blocks.10.mlp",
-      "visual.blocks.8.attn",
-      "visual.blocks.11.mlp",
       "visual.blocks.8.mlp",
-      "visual.blocks.19.mlp",
       "visual.blocks.7.mlp",
       "visual.blocks.5.mlp",
-      "visual.blocks.16.mlp",
-      "visual.blocks.13.mlp",
-      "visual.blocks.4.mlp",
-      "visual.blocks.14.attn",
-      "visual.blocks.9.mlp",
       "visual.blocks.12.mlp",
       "visual.blocks.14.mlp",
       "visual.blocks.2.mlp",
-      "visual.blocks.15.mlp",
-      "visual.blocks.6.mlp",
       "visual.blocks.5.attn",
-      "visual.blocks.3.mlp",
-      "visual.blocks.15.attn",
       "visual.blocks.4.attn",
       "visual.blocks.1.mlp",
-      "visual.blocks.2.attn",
-      "visual.blocks.7.attn",
-      "visual.blocks.1.attn",
       "visual.blocks.17.mlp",
-      "visual.blocks.3.attn",
       "visual.blocks.0.attn",
-      "visual.blocks.0.mlp"
     ],
     "llm_int8_threshold": 6.0,
     "load_in_4bit": true,
@@ -109,20 +110,76 @@
   },
   "rope_theta": 1000000.0,
   "sliding_window": 32768,
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
-  "transformers_version": "4.49.0",
   "unsloth_fixed": true,
   "use_cache": true,
   "use_sliding_window": false,
   "video_token_id": 151656,
   "vision_config": {
     "hidden_size": 1280,
     "in_chans": 3,
     "model_type": "qwen2_5_vl",
     "spatial_patch_size": 14,
     "tokens_per_second": 2,
-    "torch_dtype": "bfloat16"
   },
   "vision_end_token_id": 151653,
   "vision_start_token_id": 151652,

 {
   "architectures": [
     "Qwen2_5_VLForConditionalGeneration"
   ],
   "image_token_id": 151655,
   "initializer_range": 0.02,
   "intermediate_size": 18944,
+  "max_position_embeddings": 128000,
   "max_window_layers": 28,
   "model_type": "qwen2_5_vl",
   "num_attention_heads": 28,
       "multi_modal_projector",
       "merger",
       "modality_projection",
       "visual.blocks.27.attn",
+      "visual.blocks.28.attn",
+      "visual.blocks.25.attn",
+      "visual.blocks.22.attn",
       "visual.blocks.21.attn",
       "visual.blocks.29.mlp",
+      "visual.blocks.24.attn",
+      "visual.blocks.29.attn",
       "visual.blocks.31.attn",
+      "visual.blocks.30.attn",
       "visual.blocks.28.mlp",
+      "visual.blocks.20.attn",
       "visual.blocks.25.mlp",
+      "visual.blocks.19.attn",
+      "visual.blocks.26.mlp",
       "visual.blocks.24.mlp",
       "visual.blocks.17.attn",
+      "visual.blocks.27.mlp",
       "visual.blocks.23.attn",
       "visual.blocks.23.mlp",
       "visual.blocks.21.mlp",
+      "visual.blocks.19.mlp",
       "visual.blocks.18.attn",
+      "visual.blocks.20.mlp",
+      "visual.blocks.11.attn",
+      "visual.blocks.9.mlp",
+      "visual.blocks.9.attn",
+      "visual.blocks.16.attn",
+      "visual.blocks.11.mlp",
       "visual.blocks.22.mlp",
       "visual.blocks.18.mlp",
+      "visual.blocks.13.attn",
       "visual.blocks.12.attn",
+      "visual.blocks.6.attn",
       "visual.blocks.10.mlp",
       "visual.blocks.8.mlp",
+      "visual.blocks.8.attn",
+      "visual.blocks.14.attn",
+      "visual.blocks.4.mlp",
+      "visual.blocks.16.mlp",
       "visual.blocks.7.mlp",
+      "visual.blocks.6.mlp",
+      "visual.blocks.15.mlp",
       "visual.blocks.5.mlp",
+      "visual.blocks.10.attn",
+      "visual.blocks.3.mlp",
       "visual.blocks.12.mlp",
+      "visual.blocks.13.mlp",
       "visual.blocks.14.mlp",
       "visual.blocks.2.mlp",
       "visual.blocks.5.attn",
+      "visual.blocks.1.attn",
+      "visual.blocks.2.attn",
       "visual.blocks.4.attn",
+      "visual.blocks.3.attn",
+      "visual.blocks.15.attn",
       "visual.blocks.1.mlp",
       "visual.blocks.17.mlp",
       "visual.blocks.0.attn",
+      "visual.blocks.7.attn",
+      "visual.blocks.0.mlp",
+      "visual.blocks.31.mlp.down_proj"
     ],
     "llm_int8_threshold": 6.0,
     "load_in_4bit": true,
   },
   "rope_theta": 1000000.0,
   "sliding_window": 32768,
+  "text_config": {
+    "architectures": [
+      "Qwen2_5_VLForConditionalGeneration"
+    ],
+    "attention_dropout": 0.0,
+    "bos_token_id": 151643,
+    "eos_token_id": 151645,
+    "hidden_act": "silu",
+    "hidden_size": 3584,
+    "image_token_id": null,
+    "initializer_range": 0.02,
+    "intermediate_size": 18944,
+    "max_position_embeddings": 128000,
+    "max_window_layers": 28,
+    "model_type": "qwen2_5_vl_text",
+    "num_attention_heads": 28,
+    "num_hidden_layers": 28,
+    "num_key_value_heads": 4,
+    "rms_norm_eps": 1e-06,
+    "rope_scaling": {
+      "mrope_section": [
+        16,
+        24,
+        24
+      ],
+      "rope_type": "default",
+      "type": "default"
+    },
+    "rope_theta": 1000000.0,
+    "sliding_window": 32768,
+    "torch_dtype": "bfloat16",
+    "use_cache": true,
+    "use_sliding_window": false,
+    "video_token_id": null,
+    "vision_end_token_id": 151653,
+    "vision_start_token_id": 151652,
+    "vision_token_id": 151654,
+    "vocab_size": 152064
+  },
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
+  "transformers_version": "4.52.0.dev0",
   "unsloth_fixed": true,
   "use_cache": true,
   "use_sliding_window": false,
   "video_token_id": 151656,
   "vision_config": {
+    "depth": 32,
+    "fullatt_block_indexes": [
+      7,
+      15,
+      23,
+      31
+    ],
+    "hidden_act": "silu",
     "hidden_size": 1280,
+    "in_channels": 3,
     "in_chans": 3,
+    "initializer_range": 0.02,
+    "intermediate_size": 3420,
     "model_type": "qwen2_5_vl",
+    "num_heads": 16,
+    "out_hidden_size": 3584,
+    "patch_size": 14,
+    "spatial_merge_size": 2,
     "spatial_patch_size": 14,
+    "temporal_patch_size": 2,
     "tokens_per_second": 2,
+    "torch_dtype": "bfloat16",
+    "window_size": 112
   },
   "vision_end_token_id": 151653,
   "vision_start_token_id": 151652,

generation_config.json CHANGED Viewed

@@ -5,11 +5,9 @@
     151645,
     151643
   ],
-  "max_length": 32768,
   "pad_token_id": 151654,
   "repetition_penalty": 1.05,
-  "temperature": 0.1,
-  "top_k": 1,
-  "top_p": 0.001,
-  "transformers_version": "4.49.0"
 }

     151645,
     151643
   ],
+  "max_length": 128000,
   "pad_token_id": 151654,
   "repetition_penalty": 1.05,
+  "temperature": 1e-06,
+  "transformers_version": "4.52.0.dev0"
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:451d5705f2b6232b68e745223fa5a5edb86ba0eb678a003e0fd29936ea6138a8
-size 6851743942

 version https://git-lfs.github.com/spec/v1
+oid sha256:5f2041306642ad99ee54b96a372a2603010ba678c86595a721c3100a4febf504
+size 6858199244

tokenizer_config.json CHANGED Viewed

@@ -195,16 +195,16 @@
     "<|video_pad|>"
   ],
   "bos_token": null,
-  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
   "clean_up_tokenization_spaces": false,
   "eos_token": "<|im_end|>",
   "errors": "replace",
   "extra_special_tokens": {},
-  "model_max_length": 32768,
   "pad_token": "<|vision_pad|>",
   "padding_side": "left",
   "processor_class": "Qwen2_5_VLProcessor",
   "split_special_tokens": false,
   "tokenizer_class": "Qwen2Tokenizer",
-  "unk_token": null
-}

     "<|video_pad|>"
   ],
   "bos_token": null,
   "clean_up_tokenization_spaces": false,
   "eos_token": "<|im_end|>",
   "errors": "replace",
   "extra_special_tokens": {},
+  "model_max_length": 128000,
   "pad_token": "<|vision_pad|>",
   "padding_side": "left",
   "processor_class": "Qwen2_5_VLProcessor",
   "split_special_tokens": false,
   "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null,
+  "chat_template": "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% endif %}<|im_start|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<|im_end|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}"
+}