siglip2-finetuned-marathi-sign-language
This model is a fine-tuned version of google/siglip2-base-patch16-224 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0006
- Model Preparation Time: 0.0057
- Accuracy: 0.9997
Model description
Marathi-Sign-Language-Detection is a vision-language model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to recognize Marathi sign language hand gestures and map them to corresponding Devanagari characters using the SiglipForImageClassification architecture.
Training and evaluation data
Classification Report:
precision recall f1-score support
अ 1.0000 1.0000 1.0000 404
आ 1.0000 1.0000 1.0000 409
इ 1.0000 1.0000 1.0000 440
ई 0.9866 1.0000 0.9932 441
उ 1.0000 1.0000 1.0000 479
ऊ 1.0000 1.0000 1.0000 428
ए 1.0000 1.0000 1.0000 457
ऐ 1.0000 1.0000 1.0000 436
ओ 1.0000 1.0000 1.0000 430
औ 1.0000 1.0000 1.0000 408
क 1.0000 1.0000 1.0000 433
क्ष 1.0000 1.0000 1.0000 480
ख 1.0000 1.0000 1.0000 456
ग 1.0000 1.0000 1.0000 444
घ 1.0000 1.0000 1.0000 480
च 1.0000 1.0000 1.0000 463
छ 1.0000 1.0000 1.0000 468
ज 1.0000 1.0000 1.0000 480
ज्ञ 1.0000 1.0000 1.0000 480
झ 1.0000 1.0000 1.0000 480
ट 1.0000 1.0000 1.0000 480
ठ 1.0000 1.0000 1.0000 480
ड 1.0000 1.0000 1.0000 480
ढ 1.0000 1.0000 1.0000 480
ण 1.0000 1.0000 1.0000 480
त 1.0000 1.0000 1.0000 480
थ 1.0000 1.0000 1.0000 480
द 1.0000 0.9875 0.9937 480
ध 1.0000 1.0000 1.0000 480
न 1.0000 1.0000 1.0000 480
प 1.0000 1.0000 1.0000 480
फ 1.0000 1.0000 1.0000 480
ब 1.0000 1.0000 1.0000 480
भ 1.0000 1.0000 1.0000 480
म 1.0000 1.0000 1.0000 480
य 1.0000 1.0000 1.0000 480
र 1.0000 1.0000 1.0000 484
ल 1.0000 1.0000 1.0000 480
ळ 1.0000 1.0000 1.0000 480
व 1.0000 1.0000 1.0000 480
श 1.0000 1.0000 1.0000 480
स 1.0000 1.0000 1.0000 480
ह 1.0000 1.0000 1.0000 480
accuracy 0.9997 20040
macro avg 0.9997 0.9997 0.9997 20040
weighted avg 0.9997 0.9997 0.9997 20040
Install Dependencies
pip install -q transformers torch pillow gradio
Inference Code
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "VinayHajare/siglip2-finetuned-marathi-sign-language"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# Marathi label mapping
id2label = {
"0": "अ", "1": "आ", "2": "इ", "3": "ई", "4": "उ", "5": "ऊ",
"6": "ए", "7": "ऐ", "8": "ओ", "9": "औ", "10": "क", "11": "क्ष",
"12": "ख", "13": "ग", "14": "घ", "15": "च", "16": "छ", "17": "ज",
"18": "ज्ञ", "19": "झ", "20": "ट", "21": "ठ", "22": "ड", "23": "ढ",
"24": "ण", "25": "त", "26": "थ", "27": "द", "28": "ध", "29": "न",
"30": "प", "31": "फ", "32": "ब", "33": "भ", "34": "म", "35": "य",
"36": "र", "37": "ल", "38": "ळ", "39": "व", "40": "श", "41": "स", "42": "ह"
}
def classify_marathi_sign(image):
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
prediction = {
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
}
return prediction
# Gradio Interface
iface = gr.Interface(
fn=classify_marathi_sign,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(num_top_classes=5, label="Marathi Sign Classification"),
title="Marathi-Sign-Language-Detection",
description="Upload an image of a Marathi sign language hand gesture to identify the corresponding character."
)
if __name__ == "__main__":
iface.launch()
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-06
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 50
- num_epochs: 6
Training results
Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Accuracy |
---|---|---|---|---|---|
1.4439 | 1.0 | 940 | 0.0090 | 0.0057 | 0.9980 |
0.0052 | 2.0 | 1880 | 0.0035 | 0.0057 | 0.9993 |
0.0031 | 3.0 | 2820 | 0.0016 | 0.0057 | 0.9997 |
0.001 | 4.0 | 3760 | 0.0010 | 0.0057 | 0.9997 |
0.0007 | 5.0 | 4700 | 0.0013 | 0.0057 | 0.9997 |
0.0005 | 6.0 | 5640 | 0.0006 | 0.0057 | 0.9997 |
Framework versions
- Transformers 4.52.0.dev0
- Pytorch 2.6.0+cu124
- Datasets 3.5.1
- Tokenizers 0.21.1
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for VinayHajare/siglip2-finetuned-marathi-sign-language
Base model
google/siglip2-base-patch16-224