File size: 3,752 Bytes
921e029 dcf567a 2f80193 dcf567a 2f80193 dcf567a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
---
license: apache-2.0
datasets:
- prithivMLmods/Shoe-Net-10K
language:
- en
base_model:
- google/siglip2-base-patch16-512
pipeline_tag: image-classification
library_name: transformers
tags:
- SigLIP2
- Ballet Flat
- Boat
- Sneaker
- Clog
- Brogue
---

# shoe-type-detection
> shoe-type-detection is a vision-language encoder model fine-tuned from `google/siglip2-base-patch16-512` for **multi-class image classification**. It is trained to detect different types of shoes such as **Ballet Flats**, **Boat Shoes**, **Brogues**, **Clogs**, and **Sneakers**. The model uses the `SiglipForImageClassification` architecture.
> \[!note]
> SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
> [https://arxiv.org/pdf/2502.14786](https://arxiv.org/pdf/2502.14786)
```py
Classification Report:
precision recall f1-score support
Ballet Flat 0.8980 0.9465 0.9216 2000
Boat 0.9333 0.8750 0.9032 2000
Brogue 0.9313 0.9490 0.9401 2000
Clog 0.9244 0.8800 0.9016 2000
Sneaker 0.9137 0.9480 0.9306 2000
accuracy 0.9197 10000
macro avg 0.9202 0.9197 0.9194 10000
weighted avg 0.9202 0.9197 0.9194 10000
```

---
## Label Space: 5 Classes
```
Class 0: Ballet Flat
Class 1: Boat
Class 2: Brogue
Class 3: Clog
Class 4: Sneaker
```
---
## Install Dependencies
```bash
pip install -q transformers torch pillow gradio hf_xet
```
---
## Inference Code
```python
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/shoe-type-detection" # Update with actual model name on Hugging Face
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# Updated label mapping
id2label = {
"0": "Ballet Flat",
"1": "Boat",
"2": "Brogue",
"3": "Clog",
"4": "Sneaker"
}
def classify_image(image):
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
prediction = {
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
}
return prediction
# Gradio Interface
iface = gr.Interface(
fn=classify_image,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(num_top_classes=5, label="Shoe Type Classification"),
title="Shoe Type Detection",
description="Upload an image of a shoe to classify it as Ballet Flat, Boat, Brogue, Clog, or Sneaker."
)
if __name__ == "__main__":
iface.launch()
```
---
## Intended Use
`shoe-type-detection` is designed for:
* **E-Commerce Automation** – Automate product tagging and classification in online retail platforms.
* **Footwear Inventory Management** – Efficiently organize and categorize large volumes of shoe images.
* **Retail Intelligence** – Enable AI-powered search and filtering based on shoe types.
* **Smart Surveillance** – Identify and analyze footwear types in surveillance footage for retail analytics.
* **Fashion and Apparel Research** – Analyze trends in shoe types and customer preferences using image datasets. |