YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

CLIP ViT-H-14 Fine-tuned on Polaris Dataset

This model is a fine-tuned version of the CLIP ViT-H-14 model on the Polaris dataset. The model was trained using one-to-one image-text pairs.

Model Details

  • Base Model: CLIP ViT-H-14
  • Dataset: Polaris
  • Training Mode: One-to-one image-text pairs
  • Architecture: Vision Transformer (ViT) with CLIP text encoder

Usage

import torch
import open_clip
from PIL import Image

# Load model
model, _, preprocess = open_clip.create_model_and_transforms('ViT-H-14')
model.load_state_dict(torch.load('pytorch_model.bin'))
model.eval()

# Prepare image and text
image = Image.open('your_image.jpg')
image = preprocess(image).unsqueeze(0)
text = "your text description"

# Get embeddings
with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    # Normalize features
    image_features = image_features / image_features.norm(dim=-1, keepdim=True)
    text_features = text_features / text_features.norm(dim=-1, keepdim=True)
    
    # Calculate similarity
    similarity = (image_features @ text_features.t()).item()
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support