megaelius commited on
Commit
0453e48
·
verified ·
1 Parent(s): f671556

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -2
README.md CHANGED
@@ -7,20 +7,74 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- ## Robustness in Both Domains: CLIP Needs a Robust Text Encoder
11
 
12
  <span style="color:rgb(255, 0, 0);">Elias Abad Rocamora</span>, <span style="color:rgb(133 203 210);">Christian Schlarmann</span>, <span style="color:rgb(133 203 210);">Naman Deep Singh</span>, <span style="color:rgb(255, 0, 0);">Yongtao Wu</span>, <span style="color:rgb(133 203 210);">Matthias Hein</span> and <span style="color:rgb(255, 0, 0);">Volkan Cevher</span>
13
 
14
  <span style="color:rgb(255, 0, 0);">LIONS @ EPFL</span> and <span style="color:rgb(133 203 210);">Tübingen AI Center</span>
15
 
16
 
17
- In this repo, you will find all the models trained for our paper. You can load our models as any other CLIP model, for example, loading `LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2` can be done with:
 
 
 
 
18
 
19
  ```python
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
 
23
  ```
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
 
 
7
  pinned: false
8
  ---
9
 
10
+ # Robustness in Both Domains: CLIP Needs a Robust Text Encoder
11
 
12
  <span style="color:rgb(255, 0, 0);">Elias Abad Rocamora</span>, <span style="color:rgb(133 203 210);">Christian Schlarmann</span>, <span style="color:rgb(133 203 210);">Naman Deep Singh</span>, <span style="color:rgb(255, 0, 0);">Yongtao Wu</span>, <span style="color:rgb(133 203 210);">Matthias Hein</span> and <span style="color:rgb(255, 0, 0);">Volkan Cevher</span>
13
 
14
  <span style="color:rgb(255, 0, 0);">LIONS @ EPFL</span> and <span style="color:rgb(133 203 210);">Tübingen AI Center</span>
15
 
16
 
17
+ In this repo, you will find all the models trained for our paper.
18
+
19
+ ### Loading CLIPModels
20
+
21
+ You can load our models as any other CLIP model, for example, loading `LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2` can be done by following the "openai/clip-vit-large-patch14" example snippet:
22
 
23
  ```python
24
 
25
+ from PIL import Image
26
+ import requests
27
+
28
+ from transformers import CLIPProcessor, CLIPModel
29
+
30
+ model_name = "LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2"
31
+ processor_name = "openai/clip-vit-large-patch14"
32
+
33
+ model = CLIPModel.from_pretrained(model_name)
34
+ processor = CLIPProcessor.from_pretrained(processor_name)
35
+
36
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
37
+ image = Image.open(requests.get(url, stream=True).raw)
38
+
39
+ inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)
40
+
41
+ outputs = model(**inputs)
42
+ logits_per_image = outputs.logits_per_image # this is the image-text similarity score
43
+ probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
44
 
45
 
46
  ```
47
 
48
+ When loading other model sizes, the `processor_name` needs to be changed accordingly as:
49
+ | Model Size | Processor Name |
50
+ | - | - |
51
+ | ViT-L-14 |`"openai/clip-vit-large-patch14"`|
52
+ | ViT-H-14 |`"laion/CLIP-ViT-H-14-laion2B-s32B-b79K"`|
53
+ | ViT-g-14 |`"laion/CLIP-ViT-g-14-laion2B-s12B-b42K"`|
54
+ | ViT-bigG-14 |`"laion/CLIP-ViT-bigG-14-laion2B-39B-b160k"`|
55
+
56
+ ### Loading CLIPTextModels
57
+
58
+ If just need the text encoder, you can load it with the following snippet:
59
+
60
+ ```python
61
+ from transformers import CLIPTokenizer, CLIPTextModel
62
+
63
+ model_name = "LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2"
64
+ processor_name = "openai/clip-vit-large-patch14"
65
+
66
+ model = CLIPTextModel.from_pretrained(model_name)
67
+ tokenizer = CLIPTokenizer.from_pretrained(processor_name)
68
+
69
+ inputs = tokenizer(["a photo of a cat", "a photo of a dog"], padding=True, return_tensors="pt")
70
+
71
+ outputs = model(**inputs)
72
+ last_hidden_state = outputs.last_hidden_state
73
+ pooled_output = outputs.pooled_output # pooled (EOS token) states
74
+ ```
75
+
76
+ ### Acknowledgements
77
+
78
+ Our codebase is based in the [OpenCLIP codebase](https://github.com/mlfoundations/open_clip), we appreciate the effort of the OpenCLIP team and the release of their code and model weights.
79
 
80