Multiclass Image Classification 05142025
Collection
classification net.
•
20 items
•
Updated
•
2
IMAGENETTE is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to classify images into 10 categories from the popular Imagenette dataset using the SiglipForImageClassification architecture.
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786
ImageNet Large Scale Visual Recognition Challenge https://arxiv.org/pdf/1409.0575
Classification Report:
precision recall f1-score support
tench 0.9885 0.9834 0.9859 963
english springer 0.9843 0.9822 0.9832 955
cassette player 0.9544 0.9486 0.9515 993
chain saw 0.9257 0.8998 0.9125 858
church 0.9654 0.9798 0.9726 941
French horn 0.9757 0.9665 0.9711 956
garbage truck 0.8883 0.9761 0.9301 961
gas pump 0.9366 0.9044 0.9202 931
golf ball 0.9925 0.9716 0.9819 951
parachute 0.9821 0.9708 0.9764 960
accuracy 0.9590 9469
macro avg 0.9593 0.9583 0.9586 9469
weighted avg 0.9597 0.9590 0.9591 9469
The model predicts one of the following image classes:
0: tench
1: english springer
2: cassette player
3: chain saw
4: church
5: French horn
6: garbage truck
7: gas pump
8: golf ball
9: parachute
pip install -q transformers torch pillow gradio hf_xet
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/IMAGENETTE"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# Label mapping
id2label = {
"0": "tench",
"1": "english springer",
"2": "cassette player",
"3": "chain saw",
"4": "church",
"5": "French horn",
"6": "garbage truck",
"7": "gas pump",
"8": "golf ball",
"9": "parachute"
}
def classify_image(image):
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
prediction = {
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
}
return prediction
# Gradio Interface
iface = gr.Interface(
fn=classify_image,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(num_top_classes=3, label="Image Classification"),
title="IMAGENETTE - SigLIP2 Classifier",
description="Upload an image to classify it into one of 10 categories from the Imagenette dataset."
)
if __name__ == "__main__":
iface.launch()
IMAGENETTE is designed for:
Base model
google/siglip2-base-patch16-224