siglip2-finetuned-marathi-sign-language

This model is a fine-tuned version of google/siglip2-base-patch16-224 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0006
Model Preparation Time: 0.0057
Accuracy: 0.9997

Model description

Marathi-Sign-Language-Detection is a vision-language model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to recognize Marathi sign language hand gestures and map them to corresponding Devanagari characters using the SiglipForImageClassification architecture.

Training and evaluation data

Classification Report:
                precision    recall  f1-score   support

           अ     1.0000    1.0000    1.0000       404
           आ     1.0000    1.0000    1.0000       409
           इ     1.0000    1.0000    1.0000       440
           ई     0.9866    1.0000    0.9932       441
           उ     1.0000    1.0000    1.0000       479
           ऊ     1.0000    1.0000    1.0000       428
           ए     1.0000    1.0000    1.0000       457
           ऐ     1.0000    1.0000    1.0000       436
           ओ     1.0000    1.0000    1.0000       430
           औ     1.0000    1.0000    1.0000       408
           क     1.0000    1.0000    1.0000       433
           क्ष     1.0000    1.0000    1.0000       480
           ख     1.0000    1.0000    1.0000       456
           ग     1.0000    1.0000    1.0000       444
           घ     1.0000    1.0000    1.0000       480
           च     1.0000    1.0000    1.0000       463
           छ     1.0000    1.0000    1.0000       468
           ज     1.0000    1.0000    1.0000       480
           ज्ञ     1.0000    1.0000    1.0000       480
           झ     1.0000    1.0000    1.0000       480
           ट     1.0000    1.0000    1.0000       480
           ठ     1.0000    1.0000    1.0000       480
           ड     1.0000    1.0000    1.0000       480
           ढ     1.0000    1.0000    1.0000       480
           ण     1.0000    1.0000    1.0000       480
           त     1.0000    1.0000    1.0000       480
           थ     1.0000    1.0000    1.0000       480
           द     1.0000    0.9875    0.9937       480
           ध     1.0000    1.0000    1.0000       480
           न     1.0000    1.0000    1.0000       480
           प     1.0000    1.0000    1.0000       480
           फ     1.0000    1.0000    1.0000       480
           ब     1.0000    1.0000    1.0000       480
           भ     1.0000    1.0000    1.0000       480
           म     1.0000    1.0000    1.0000       480
           य     1.0000    1.0000    1.0000       480
           र     1.0000    1.0000    1.0000       484
           ल     1.0000    1.0000    1.0000       480
           ळ     1.0000    1.0000    1.0000       480
           व     1.0000    1.0000    1.0000       480
           श     1.0000    1.0000    1.0000       480
           स     1.0000    1.0000    1.0000       480
           ह     1.0000    1.0000    1.0000       480

    accuracy                         0.9997     20040
   macro avg     0.9997    0.9997    0.9997     20040
weighted avg     0.9997    0.9997    0.9997     20040

Install Dependencies

pip install -q transformers torch pillow gradio

Inference Code

import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch

# Load model and processor
model_name = "VinayHajare/siglip2-finetuned-marathi-sign-language"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

# Marathi label mapping
id2label = {
    "0": "अ", "1": "आ", "2": "इ", "3": "ई", "4": "उ", "5": "ऊ",
    "6": "ए", "7": "ऐ", "8": "ओ", "9": "औ", "10": "क", "11": "क्ष",
    "12": "ख", "13": "ग", "14": "घ", "15": "च", "16": "छ", "17": "ज",
    "18": "ज्ञ", "19": "झ", "20": "ट", "21": "ठ", "22": "ड", "23": "ढ",
    "24": "ण", "25": "त", "26": "थ", "27": "द", "28": "ध", "29": "न",
    "30": "प", "31": "फ", "32": "ब", "33": "भ", "34": "म", "35": "य",
    "36": "र", "37": "ल", "38": "ळ", "39": "व", "40": "श", "41": "स", "42": "ह"
}

def classify_marathi_sign(image):
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

    prediction = {
        id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
    }

    return prediction

# Gradio Interface
iface = gr.Interface(
    fn=classify_marathi_sign,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(num_top_classes=5, label="Marathi Sign Classification"),
    title="Marathi-Sign-Language-Detection",
    description="Upload an image of a Marathi sign language hand gesture to identify the corresponding character."
)

if __name__ == "__main__":
    iface.launch()

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-06
train_batch_size: 32
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 50
num_epochs: 6

Training results

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time	Accuracy
1.4439	1.0	940	0.0090	0.0057	0.9980
0.0052	2.0	1880	0.0035	0.0057	0.9993
0.0031	3.0	2820	0.0016	0.0057	0.9997
0.001	4.0	3760	0.0010	0.0057	0.9997
0.0007	5.0	4700	0.0013	0.0057	0.9997
0.0005	6.0	5640	0.0006	0.0057	0.9997

Framework versions

Transformers 4.52.0.dev0
Pytorch 2.6.0+cu124
Datasets 3.5.1
Tokenizers 0.21.1

VinayHajare
/

siglip2-finetuned-marathi-sign-language