zer0int
/

LongCLIP-GmP-ViT-L-14

Zero-Shot Image Classification

Model card Files Files and versions Community

zer0int commited on Sep 21, 2024

Commit

0568c73

·

verified ·

1 Parent(s): 6ddee75

Inference with HF

Files changed (1) hide show

README.md +39 -0

README.md CHANGED Viewed

@@ -3,6 +3,45 @@ datasets:
 - SPRIGHT-T2I/spright_coco
 ---
 ## A fine-tune of [BeichenZhang/LongCLIP-L](https://huggingface.co/BeichenZhang/LongCLIP-L) -- Long-CLIP ViT-L/14 expanded to 248 tokens.
 ----
 ## Update 12/AUG/2024:
 New *BEST* model, custom loss with label smoothing.

 - SPRIGHT-T2I/spright_coco
 ---
 ## A fine-tune of [BeichenZhang/LongCLIP-L](https://huggingface.co/BeichenZhang/LongCLIP-L) -- Long-CLIP ViT-L/14 expanded to 248 tokens.
+----
+# 🚨 IMPORTANT NOTE for loading with HuggingFace Transformers: 👀
+```
+model_id = "zer0int/LongCLIP-GmP-ViT-L-14"
+model = CLIPModel.from_pretrained(model_id)
+processor = CLIPProcessor.from_pretrained(model_id)
+```
+# ❌ Error due to mismatch with defined 77 tokens in Transformers library
+# 👇
+# Option 1 (simple & worse):
+Truncate to 77 tokens
+`CLIPModel.from_pretrained(model_id, ignore_mismatched_sizes=True)`
+```
+# Cosine similarities for 77 tokens is WORSE:
+# tensor[photo of a cat, picture of a dog, cat, dog] # image ground truth: cat photo
+tensor([[0.16484, 0.0749, 0.1618, 0.0774]], device='cuda:0') 📉
+```
+# 👇
+# Option 2 (edit Transformers) 💖 RECOMMENDED 💖:
+- 👉 Find the line that says `max_position_embeddings=77,` in `[System Python]/site-packages/transformers/models/clip/configuration_clip.py`
+- 👉 Change to: `max_position_embeddings=248,`
+# Now, in your inference code, for text:
+- `text_input = processor([your-prompt-or-prompts-as-usual], padding="max_length", max_length=248)`
+- or:
+- `text_input = processor([your-prompt-or-prompts-as-usual], padding="True")`
+```
+# Resulting Cosine Similarities for 248 tokens padded:
+# tensor[photo of a cat, picture of a dog, cat, dog] -- image ground truth: cat photo
+tensor([[0.2128, 0.0978, 0.1957, 0.1133]], device='cuda:0') ✅
+```
 ----
 ## Update 12/AUG/2024:
 New *BEST* model, custom loss with label smoothing.