--- license: apache-2.0 datasets: - Elriggs/imagenet-50-subset language: - en base_model: - google/siglip2-base-patch16-224 pipeline_tag: image-classification library_name: transformers tags: - image-net - 50-class - photo --- ![1.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/fEr855hGJbUzyzMVxfolJ.png) # **imagenet-50-subset** > **imagenet-50-subset** is a vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for **multi-class image classification**. It is trained to classify images into a subset of 50 categories derived from the **ImageNet** dataset using the **SiglipForImageClassification** architecture. ```py Classification Report: precision recall f1-score support tench 0.9878 0.9911 0.9895 900 goldfish 0.9945 0.9956 0.9950 900 great white shark 0.9339 0.8944 0.9137 900 tiger shark 0.8957 0.8967 0.8962 900 hammerhead 0.9300 0.9589 0.9442 900 electric ray 0.8788 0.8622 0.8704 900 stingray 0.8689 0.8911 0.8799 900 cock 0.9000 0.9200 0.9099 900 hen 0.9162 0.8867 0.9012 900 ostrich 0.9945 0.9989 0.9967 900 brambling 0.9671 0.9478 0.9574 900 goldfinch 0.9867 0.9911 0.9889 900 house finch 0.9629 0.9811 0.9719 900 junco 0.9583 0.9700 0.9641 900 indigo bunting 0.9933 0.9911 0.9922 900 robin 0.9888 0.9811 0.9849 900 bulbul 0.9735 0.9811 0.9773 900 jay 0.9855 0.9789 0.9822 900 magpie 0.9776 0.9700 0.9738 900 chickadee 0.9834 0.9844 0.9839 900 water ouzel 0.9680 0.9744 0.9712 900 kite 0.9512 0.9522 0.9517 900 bald eagle 0.9843 0.9722 0.9782 900 vulture 0.9562 0.9700 0.9630 900 great grey owl 0.9989 0.9944 0.9967 900 european fire salamander 0.9330 0.9278 0.9304 900 common newt 0.7969 0.7933 0.7951 900 eft 0.9162 0.8989 0.9075 900 spotted salamander 0.9249 0.9300 0.9274 900 axolotl 0.9888 0.9767 0.9827 900 bullfrog 0.9116 0.9167 0.9141 900 tree frog 0.9108 0.9533 0.9316 900 tailed frog 0.8658 0.8100 0.8370 900 loggerhead 0.8657 0.8956 0.8804 900 leatherback turtle 0.9038 0.8667 0.8849 900 mud turtle 0.7980 0.7111 0.7521 900 terrapin 0.7039 0.7844 0.7420 900 box turtle 0.8576 0.8633 0.8605 900 banded gecko 0.9255 0.9111 0.9183 900 common iguana 0.9033 0.9133 0.9083 900 american chameleon 0.6577 0.7622 0.7061 900 whiptail 0.8351 0.8722 0.8533 900 agama 0.9010 0.8900 0.8955 900 frilled lizard 0.9674 0.9233 0.9449 900 alligator lizard 0.8862 0.8822 0.8842 900 gila monster 0.9821 0.9733 0.9777 900 green lizard 0.6574 0.5756 0.6137 900 african chameleon 0.9573 0.9711 0.9641 900 komodo dragon 0.9693 0.9811 0.9752 900 african crocodile 0.9769 0.9878 0.9823 900 accuracy 0.9181 45000 macro avg 0.9186 0.9181 0.9181 45000 weighted avg 0.9186 0.9181 0.9181 45000 ``` --- ## **Label Space: 50 Classes** The model classifies each image into one of the following categories: ``` 0: tench 1: goldfish 2: great white shark 3: tiger shark 4: hammerhead 5: electric ray 6: stingray 7: cock 8: hen 9: ostrich 10: brambling 11: goldfinch 12: house finch 13: junco 14: indigo bunting 15: robin 16: bulbul 17: jay 18: magpie 19: chickadee 20: water ouzel 21: kite 22: bald eagle 23: vulture 24: great grey owl 25: european fire salamander 26: common newt 27: eft 28: spotted salamander 29: axolotl 30: bullfrog 31: tree frog 32: tailed frog 33: loggerhead 34: leatherback turtle 35: mud turtle 36: terrapin 37: box turtle 38: banded gecko 39: common iguana 40: american chameleon 41: whiptail 42: agama 43: frilled lizard 44: alligator lizard 45: gila monster 46: green lizard 47: african chameleon 48: komodo dragon 49: african crocodile ``` --- ## **Install Dependencies** ```bash pip install -q transformers torch pillow gradio ``` --- ## **Inference Code** ```python import gradio as gr from transformers import AutoImageProcessor, SiglipForImageClassification from PIL import Image import torch # Load model and processor model_name = "prithivMLmods/imagenet-50-subset" # Replace if different model = SiglipForImageClassification.from_pretrained(model_name) processor = AutoImageProcessor.from_pretrained(model_name) # Label mapping id2label = { "0": "tench", "1": "goldfish", "2": "great white shark", "3": "tiger shark", "4": "hammerhead", "5": "electric ray", "6": "stingray", "7": "cock", "8": "hen", "9": "ostrich", "10": "brambling", "11": "goldfinch", "12": "house finch", "13": "junco", "14": "indigo bunting", "15": "robin", "16": "bulbul", "17": "jay", "18": "magpie", "19": "chickadee", "20": "water ouzel", "21": "kite", "22": "bald eagle", "23": "vulture", "24": "great grey owl", "25": "european fire salamander", "26": "common newt", "27": "eft", "28": "spotted salamander", "29": "axolotl", "30": "bullfrog", "31": "tree frog", "32": "tailed frog", "33": "loggerhead", "34": "leatherback turtle", "35": "mud turtle", "36": "terrapin", "37": "box turtle", "38": "banded gecko", "39": "common iguana", "40": "american chameleon", "41": "whiptail", "42": "agama", "43": "frilled lizard", "44": "alligator lizard", "45": "gila monster", "46": "green lizard", "47": "african chameleon", "48": "komodo dragon", "49": "african crocodile" } def classify_imagenet_50(image): image = Image.fromarray(image).convert("RGB") inputs = processor(images=image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist() prediction = { id2label[str(i)]: round(probs[i], 3) for i in range(len(probs)) } return prediction # Gradio Interface iface = gr.Interface( fn=classify_imagenet_50, inputs=gr.Image(type="numpy"), outputs=gr.Label(num_top_classes=5, label="ImageNet-50 Classification"), title="imagenet-50-subset", description="Upload an image to classify it into one of 50 selected ImageNet categories." ) if __name__ == "__main__": iface.launch() ``` --- ## **Intended Use** **imagenet-50-subset** can be used for: * **Benchmarking Lightweight Vision Models** – Quick testing on a meaningful subset of ImageNet classes. * **Educational Demos** – Teaching about classification tasks with a simpler label space. * **Prototype Deployment** – Use in applications where full ImageNet coverage is unnecessary. * **Dataset Analysis** – Classification-based filtering of visual content into known object classes.