CultureCLIP Model (LoRA Merged)

This is a CLIP model fine-tuned with LoRA for cultural understanding and image-text matching. The LoRA weights have been merged into the base model.

Model Details

Base Model: openai/clip-vit-base-patch32
Task: Contrastive Image-Text Learning
Framework: PyTorch
Fine-tuning Approach: LoRA (Low-Rank Adaptation)

LoRA Configuration

Rank (r): 4
Alpha: 16
Dropout: 0.1
Target Modules: v_proj, q_proj
Task Type: FEATURE_EXTRACTION

Usage

from transformers import CLIPModel, CLIPProcessor

# Load model and processor
model = CLIPModel.from_pretrained("lukahh/cultureclip_lora_0315_100k_32_1_0")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")  # Use base model's processor

# Process text and images
inputs = processor(
    text=["a photo of a cat", "a photo of a dog"],
    images=image,
    return_tensors="pt",
    padding=True
)

# Get outputs
outputs = model(**inputs)

Training Details

This model was fine-tuned using LoRA and then merged back into the base model. The LoRA approach enables efficient adaptation of the CLIP model while maintaining its core capabilities.