paligemma2-3b-lora-vqa-v21-enhanced-d8000-r8 - v2.1 Enhanced
This is a v2.1 Enhanced LoRA adapter for PaliGemma-2 3B trained on VQA tasks.
π v2.1 Enhanced Improvements
- EOS Token Learning: Explicit EOS tokens for better generation termination
- Memory Optimization: 16-step gradient accumulation for stability
- VizWiz Format Support: Full support with most frequent answer selection
- Robust Label Masking: Enhanced prompt masking during training
- Production Memory Management: Advanced garbage collection
Usage
from transformers import AutoProcessor, PaliGemmaForConditionalGeneration
from peft import PeftModel
import torch
from PIL import Image
# Base model
base_model_id = "google/paligemma2-3b-mix-224"
adapter_id = "yu3733/paligemma2-3b-lora-vqa-v21-enhanced-d8000-r8"
# Load processor
processor = AutoProcessor.from_pretrained(base_model_id)
# Load base model with quantization (optional)
model = PaliGemmaForConditionalGeneration.from_pretrained(
base_model_id,
torch_dtype=torch.float16,
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)
# Prepare input
image = Image.open("your_image.jpg")
prompt = "<image>\nQuestion: What is in this image?\nAnswer:"
# Process
inputs = processor(text=prompt, images=image, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
# Generate
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=20)
# Decode
print(processor.decode(outputs[0], skip_special_tokens=True))
Training Configuration
- Base Model: google/paligemma2-3b-mix-224
- LoRA Rank: 8
- Training Framework: PEFT + Transformers
- Optimization: 4-bit quantization + gradient checkpointing
- Dataset: VizWiz VQA
License
Same as the base model (see google/paligemma2-3b-mix-224)
- Downloads last month
- 20
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for yu3733/paligemma2-3b-lora-vqa-v21-enhanced-d8000-r8
Base model
google/paligemma2-3b-mix-224