YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
CLIP ViT-H-14 Fine-tuned on Polaris Dataset
This model is a fine-tuned version of the CLIP ViT-H-14 model on the Polaris dataset. The model was trained using one-to-one image-text pairs.
Model Details
- Base Model: CLIP ViT-H-14
- Dataset: Polaris
- Training Mode: One-to-one image-text pairs
- Architecture: Vision Transformer (ViT) with CLIP text encoder
Usage
import torch
import open_clip
from PIL import Image
# Load model
model, _, preprocess = open_clip.create_model_and_transforms('ViT-H-14')
model.load_state_dict(torch.load('pytorch_model.bin'))
model.eval()
# Prepare image and text
image = Image.open('your_image.jpg')
image = preprocess(image).unsqueeze(0)
text = "your text description"
# Get embeddings
with torch.no_grad():
image_features = model.encode_image(image)
text_features = model.encode_text(text)
# Normalize features
image_features = image_features / image_features.norm(dim=-1, keepdim=True)
text_features = text_features / text_features.norm(dim=-1, keepdim=True)
# Calculate similarity
similarity = (image_features @ text_features.t()).item()
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support