Llama-3.2-Vision-chinese-lora
- base model: meta-llama/Llama-3.2-11B-Vision-Instruct
Features
- Utilize a large amount of high-quality Chinese text and VQA data to significantly enhance the model's Chinese OCR capabilities.
Use with transformers
import torch
from transformers import MllamaForConditionalGeneration, AutoProcessor
from peft import PeftModel
from PIL import Image
# Base model ID and LoRA model ID
base_model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
lora_model_id = "Kadins/Llama-3.2-Vision-chinese-lora"
# Load the processor
processor = AutoProcessor.from_pretrained(base_model_id)
# Load the base model
base_model = MllamaForConditionalGeneration.from_pretrained(
base_model_id,
device_map="auto",
torch_dtype=torch.float16 # Use torch.bfloat16 if your hardware supports it
).eval()
# Load the LoRA model and apply it to the base model
model = PeftModel.from_pretrained(base_model, lora_model_id)
# Optionally, merge the LoRA weights with the base model for faster inference
model = model.merge_and_unload()
# Load an example image (replace 'path_to_image.jpg' with your image file)
image_path = 'path_to_image.jpg'
image = Image.open(image_path)
# User prompt in Chinese
user_prompt = "请描述这张图片。"
# Prepare the content with the image and text
content = [
{"type": "image", "image": image},
{"type": "text", "text": user_prompt}
]
# Apply the chat template to create the prompt
prompt = processor.apply_chat_template(
[{"role": "user", "content": content}],
add_generation_prompt=True
)
# Prepare the inputs for the model
inputs = processor(
images=image,
text=prompt,
return_tensors="pt"
).to(model.device)
# Generate the model's response
output = model.generate(**inputs, max_new_tokens=512)
# Decode the output to get the assistant's response
response = processor.decode(output[0], skip_special_tokens=True)
# Print the assistant's response
print("Assistant:", response)
- Downloads last month
- 84
Inference API (serverless) does not yet support peft models for this pipeline type.
Model tree for Kadins/Llama-3.2-Vision-chinese-lora
Base model
meta-llama/Llama-3.2-11B-Vision-Instruct