YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

CultureCLIP Model (LoRA Merged)

This is a CLIP model fine-tuned with LoRA for cultural understanding and image-text matching. The LoRA weights have been merged into the base model.

Model Details

  • Base Model: openai/clip-vit-base-patch32
  • Task: Contrastive Image-Text Learning
  • Framework: PyTorch
  • Fine-tuning Approach: LoRA (Low-Rank Adaptation)

LoRA Configuration

  • Rank (r): 4
  • Alpha: 16
  • Dropout: 0.1
  • Target Modules: v_proj, q_proj
  • Task Type: FEATURE_EXTRACTION

Usage

from transformers import CLIPModel, CLIPProcessor

# Load model and processor
model = CLIPModel.from_pretrained("lukahh/cultureclip_lora_0315_100k_32_1_0")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")  # Use base model's processor

# Process text and images
inputs = processor(
    text=["a photo of a cat", "a photo of a dog"],
    images=image,
    return_tensors="pt",
    padding=True
)

# Get outputs
outputs = model(**inputs)

Training Details

This model was fine-tuned using LoRA and then merged back into the base model. The LoRA approach enables efficient adaptation of the CLIP model while maintaining its core capabilities.

Downloads last month
10
Safetensors
Model size
151M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support