Qwen2.5-VL-32B Tool Assistant with LoRA fine-tuning

This is a LoRA adapter for the Qwen2.5-VL-32B model, fine-tuned for tool-use with visual input.

Usage

from transformers import AutoProcessor, AutoModelForCausalLM
from peft import PeftModel
import torch
from PIL import Image

# Load the model
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-32B-Instruct")
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-VL-32B-Instruct", 
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
model = PeftModel.from_pretrained(
    base_model, 
    "srai86825/qwen-vl-tool-assistant-lora"
)

# Use the model
image = Image.open("your_image.jpg")
text = "What is in this image?"

inputs = processor(text=text, images=image, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)

Training Details

Base model: Qwen/Qwen2.5-VL-32B-Instruct
Fine-tuning method: LoRA with rank 8
Target modules: all
Training data: Custom tool-use dataset

Model Architecture

This model uses the Low-Rank Adaptation (LoRA) technique to efficiently fine-tune the Qwen2.5-VL-32B-Instruct model. LoRA works by adding small, trainable rank decomposition matrices to existing weights, allowing for parameter-efficient fine-tuning.

The adapter is applied to all attention layers in the model, which allows it to learn new capabilities without modifying the entire model.

Limitations

This model inherits the limitations of the base Qwen2.5-VL model
The fine-tuning data may introduce biases or limitations in certain domains
For optimal performance, use images similar in style and content to what the model was trained on

srai86825
/

qwen-vl-tool-assistant-lora

Qwen2.5-VL-32B Tool Assistant with LoRA fine-tuning

Usage

Training Details

Model Architecture

Limitations

Model tree for srai86825/qwen-vl-tool-assistant-lora