SigLIP2 042025
Collection
Age, Gender, Race and More ..
•
13 items
•
Updated
•
2
Indian-Western-Food-34 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify food images into various Indian and Western dishes using the SiglipForImageClassification architecture.
Classification Report:
precision recall f1-score support
Baked Potato 0.9912 0.9780 0.9846 1500
Crispy Chicken 0.9811 0.9707 0.9759 1500
Donut 0.9893 0.9893 0.9893 1500
Fries 0.9742 0.9827 0.9784 1500
Hot Dog 0.9830 0.9735 0.9783 1548
Sandwich 0.9898 0.9673 0.9784 1500
Taco 0.9327 0.9427 0.9377 1500
Taquito 0.9624 0.9387 0.9504 1500
Apple Pie 0.9666 0.9540 0.9602 1000
Burger 0.9114 0.9940 0.9509 331
Butter Naan 0.9691 0.9186 0.9431 307
Chai 0.9801 1.0000 0.9899 344
Chapati 0.9188 0.9694 0.9435 327
Cheesecake 0.9573 0.9640 0.9606 1000
Chicken Curry 0.9610 0.9850 0.9728 1000
Chole Bhature 0.9841 0.9867 0.9854 376
Dal Makhani 0.9698 0.9797 0.9747 295
Dhokla 0.9959 0.9959 0.9959 245
Fried Rice 0.9485 1.0000 0.9736 350
Ice Cream 0.9569 0.9770 0.9668 1000
Idli 0.9934 1.0000 0.9967 302
Jalebi 0.9931 1.0000 0.9965 288
Kaathi Rolls 0.9640 0.9606 0.9623 279
Kadai Paneer 0.9848 0.9731 0.9789 334
Kulfi 0.9810 0.9673 0.9741 214
Masala Dosa 0.9890 0.9890 0.9890 273
Momos 0.9908 0.9969 0.9938 323
Omelette 0.9829 0.9790 0.9810 1000
Paani Puri 0.9281 0.9861 0.9562 144
Pakode 0.9738 0.9665 0.9701 269
Pav Bhaji 0.9901 0.9803 0.9852 305
Pizza 0.9647 0.9927 0.9785 275
Samosa 0.9878 0.9959 0.9918 244
Sushi 0.9969 0.9800 0.9884 1000
accuracy 0.9729 23873
macro avg 0.9719 0.9775 0.9745 23873
weighted avg 0.9731 0.9729 0.9729 23873
The model categorizes images into 34 food classes:
!pip install -q transformers torch pillow gradio
import gradio as gr
from transformers import AutoImageProcessor
from transformers import SiglipForImageClassification
from transformers.image_utils import load_image
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/Indian-Western-Food-34"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
def food_classification(image):
"""Predicts the type of food in an image."""
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
labels = {
"0": "Baked Potato", "1": "Crispy Chicken", "2": "Donut", "3": "Fries",
"4": "Hot Dog", "5": "Sandwich", "6": "Taco", "7": "Taquito", "8": "Apple Pie",
"9": "Burger", "10": "Butter Naan", "11": "Chai", "12": "Chapati", "13": "Cheesecake",
"14": "Chicken Curry", "15": "Chole Bhature", "16": "Dal Makhani", "17": "Dhokla",
"18": "Fried Rice", "19": "Ice Cream", "20": "Idli", "21": "Jalebi", "22": "Kaathi Rolls",
"23": "Kadai Paneer", "24": "Kulfi", "25": "Masala Dosa", "26": "Momos", "27": "Omelette",
"28": "Paani Puri", "29": "Pakode", "30": "Pav Bhaji", "31": "Pizza", "32": "Samosa",
"33": "Sushi"
}
predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}
return predictions
# Create Gradio interface
iface = gr.Interface(
fn=food_classification,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(label="Prediction Scores"),
title="Indian & Western Food Classification",
description="Upload a food image to classify it into one of the 34 food types."
)
# Launch the app
if __name__ == "__main__":
iface.launch()
The Indian-Western-Food-34 model is designed to classify food images into Indian and Western dishes. Potential use cases include:
Base model
google/siglip2-base-patch16-224