Image Classification
Transformers
Safetensors
English
siglip
ImageNet
SigLIP2
Classifier

3.png

IMAGENETTE

IMAGENETTE is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to classify images into 10 categories from the popular Imagenette dataset using the SiglipForImageClassification architecture.

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786

ImageNet Large Scale Visual Recognition Challenge https://arxiv.org/pdf/1409.0575

Classification Report:
                  precision    recall  f1-score   support

           tench     0.9885    0.9834    0.9859       963
english springer     0.9843    0.9822    0.9832       955
 cassette player     0.9544    0.9486    0.9515       993
       chain saw     0.9257    0.8998    0.9125       858
          church     0.9654    0.9798    0.9726       941
     French horn     0.9757    0.9665    0.9711       956
   garbage truck     0.8883    0.9761    0.9301       961
        gas pump     0.9366    0.9044    0.9202       931
       golf ball     0.9925    0.9716    0.9819       951
       parachute     0.9821    0.9708    0.9764       960

        accuracy                         0.9590      9469
       macro avg     0.9593    0.9583    0.9586      9469
    weighted avg     0.9597    0.9590    0.9591      9469

download.png


Label Space: 10 Classes

The model predicts one of the following image classes:

0: tench
1: english springer
2: cassette player
3: chain saw
4: church
5: French horn
6: garbage truck
7: gas pump
8: golf ball
9: parachute

Install Dependencies

pip install -q transformers torch pillow gradio hf_xet

Inference Code

import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch

# Load model and processor
model_name = "prithivMLmods/IMAGENETTE"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

# Label mapping
id2label = {
    "0": "tench",
    "1": "english springer",
    "2": "cassette player",
    "3": "chain saw",
    "4": "church",
    "5": "French horn",
    "6": "garbage truck",
    "7": "gas pump",
    "8": "golf ball",
    "9": "parachute"
}

def classify_image(image):
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
    
    prediction = {
        id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
    }

    return prediction

# Gradio Interface
iface = gr.Interface(
    fn=classify_image,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(num_top_classes=3, label="Image Classification"),
    title="IMAGENETTE - SigLIP2 Classifier",
    description="Upload an image to classify it into one of 10 categories from the Imagenette dataset."
)

if __name__ == "__main__":
    iface.launch()

Intended Use

IMAGENETTE is designed for:

  • Educational purposes and model benchmarking.
  • Demonstrating the performance of SigLIP2 on a small but diverse classification task.
  • Fine-tuning workflows on vision-language models.
Downloads last month
6
Safetensors
Model size
92.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/IMAGENETTE

Finetuned
(90)
this model

Dataset used to train prithivMLmods/IMAGENETTE

Collection including prithivMLmods/IMAGENETTE