Image Classification
Birder
PyTorch
hassonofer commited on
Commit
8e51f3e
·
verified ·
1 Parent(s): 5fa5e25

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -3
README.md CHANGED
@@ -1,3 +1,116 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - birder
5
+ - pytorch
6
+ library_name: birder
7
+ license: apache-2.0
8
+ base_model:
9
+ - birder-project/vit_l16_mim
10
+ ---
11
+
12
+ # Model Card for vit_l16_mim-eu-common
13
+
14
+ A ViT image classification model. The model follows a two-stage training process: first, masked image modeling, then fine-tuned specifically on the `eu-common` dataset.
15
+
16
+ The species list is derived from the Collins bird guide [^1].
17
+
18
+ [^1]: Svensson, L., Mullarney, K., & Zetterström, D. (2022). Collins bird guide (3rd ed.). London, England: William Collins.
19
+
20
+ ## Model Details
21
+
22
+ - **Model Type:** Image classification and detection backbone
23
+ - **Model Stats:**
24
+ - Params (M): 304.1
25
+ - Input image size: 256 x 256
26
+ - **Dataset:** eu-common (707 classes)
27
+
28
+ - **Papers:**
29
+ - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: <https://arxiv.org/abs/2010.11929>
30
+ - Masked Autoencoders Are Scalable Vision Learners: <https://arxiv.org/abs/2111.06377>
31
+
32
+ ## Model Usage
33
+
34
+ ### Image Classification
35
+
36
+ ```python
37
+ import birder
38
+ from birder.inference.classification import infer_image
39
+
40
+ (net, model_info) = birder.load_pretrained_model("vit_l16_mim-eu-common", inference=True)
41
+
42
+ # Get the image size the model was trained on
43
+ size = birder.get_size_from_signature(model_info.signature)
44
+
45
+ # Create an inference transform
46
+ transform = birder.classification_transform(size, model_info.rgb_stats)
47
+
48
+ image = "path/to/image.jpeg" # or a PIL image, must be loaded in RGB format
49
+ (out, _) = infer_image(net, image, transform)
50
+ # out is a NumPy array with shape of (1, 707), representing class probabilities.
51
+ ```
52
+
53
+ ### Image Embeddings
54
+
55
+ ```python
56
+ import birder
57
+ from birder.inference.classification import infer_image
58
+
59
+ (net, model_info) = birder.load_pretrained_model("vit_l16_mim-eu-common", inference=True)
60
+
61
+ # Get the image size the model was trained on
62
+ size = birder.get_size_from_signature(model_info.signature)
63
+
64
+ # Create an inference transform
65
+ transform = birder.classification_transform(size, model_info.rgb_stats)
66
+
67
+ image = "path/to/image.jpeg" # or a PIL image
68
+ (out, embedding) = infer_image(net, image, transform, return_embedding=True)
69
+ # embedding is a NumPy array with shape of (1, 1024)
70
+ ```
71
+
72
+ ### Detection Feature Map
73
+
74
+ ```python
75
+ from PIL import Image
76
+ import birder
77
+
78
+ (net, model_info) = birder.load_pretrained_model("vit_l16_mim-eu-common", inference=True)
79
+
80
+ # Get the image size the model was trained on
81
+ size = birder.get_size_from_signature(model_info.signature)
82
+
83
+ # Create an inference transform
84
+ transform = birder.classification_transform(size, model_info.rgb_stats)
85
+
86
+ image = Image.open("path/to/image.jpeg")
87
+ features = net.detection_features(transform(image).unsqueeze(0))
88
+ # features is a dict (stage name -> torch.Tensor)
89
+ print([(k, v.size()) for k, v in features.items()])
90
+ # Output example:
91
+ # [('neck', torch.Size([1, 1024, 16, 16]))]
92
+ ```
93
+
94
+ ## Citation
95
+
96
+ ```bibtex
97
+ @misc{dosovitskiy2021imageworth16x16words,
98
+ title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
99
+ author={Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby},
100
+ year={2021},
101
+ eprint={2010.11929},
102
+ archivePrefix={arXiv},
103
+ primaryClass={cs.CV},
104
+ url={https://arxiv.org/abs/2010.11929},
105
+ }
106
+
107
+ @misc{he2021maskedautoencodersscalablevision,
108
+ title={Masked Autoencoders Are Scalable Vision Learners},
109
+ author={Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Dollár and Ross Girshick},
110
+ year={2021},
111
+ eprint={2111.06377},
112
+ archivePrefix={arXiv},
113
+ primaryClass={cs.CV},
114
+ url={https://arxiv.org/abs/2111.06377},
115
+ }
116
+ ```