siglip2-finetuned-marathi-sign-language

This model is a fine-tuned version of google/siglip2-base-patch16-224 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0006
  • Model Preparation Time: 0.0057
  • Accuracy: 0.9997

Model description

Marathi-Sign-Language-Detection is a vision-language model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to recognize Marathi sign language hand gestures and map them to corresponding Devanagari characters using the SiglipForImageClassification architecture.

Training and evaluation data

Classification Report:
                precision    recall  f1-score   support

           अ     1.0000    1.0000    1.0000       4041.0000    1.0000    1.0000       4091.0000    1.0000    1.0000       4400.9866    1.0000    0.9932       4411.0000    1.0000    1.0000       4791.0000    1.0000    1.0000       4281.0000    1.0000    1.0000       4571.0000    1.0000    1.0000       4361.0000    1.0000    1.0000       4301.0000    1.0000    1.0000       4081.0000    1.0000    1.0000       433
           क्ष     1.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4561.0000    1.0000    1.0000       4441.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4631.0000    1.0000    1.0000       4681.0000    1.0000    1.0000       480
           ज्ञ     1.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    0.9875    0.9937       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4841.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       4801.0000    1.0000    1.0000       480

    accuracy                         0.9997     20040
   macro avg     0.9997    0.9997    0.9997     20040
weighted avg     0.9997    0.9997    0.9997     20040

Install Dependencies

pip install -q transformers torch pillow gradio

Inference Code

import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch

# Load model and processor
model_name = "VinayHajare/siglip2-finetuned-marathi-sign-language"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

# Marathi label mapping
id2label = {
    "0": "अ", "1": "आ", "2": "इ", "3": "ई", "4": "उ", "5": "ऊ",
    "6": "ए", "7": "ऐ", "8": "ओ", "9": "औ", "10": "क", "11": "क्ष",
    "12": "ख", "13": "ग", "14": "घ", "15": "च", "16": "छ", "17": "ज",
    "18": "ज्ञ", "19": "झ", "20": "ट", "21": "ठ", "22": "ड", "23": "ढ",
    "24": "ण", "25": "त", "26": "थ", "27": "द", "28": "ध", "29": "न",
    "30": "प", "31": "फ", "32": "ब", "33": "भ", "34": "म", "35": "य",
    "36": "र", "37": "ल", "38": "ळ", "39": "व", "40": "श", "41": "स", "42": "ह"
}

def classify_marathi_sign(image):
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

    prediction = {
        id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
    }

    return prediction

# Gradio Interface
iface = gr.Interface(
    fn=classify_marathi_sign,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(num_top_classes=5, label="Marathi Sign Classification"),
    title="Marathi-Sign-Language-Detection",
    description="Upload an image of a Marathi sign language hand gesture to identify the corresponding character."
)

if __name__ == "__main__":
    iface.launch()

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-06
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 6

Training results

Training Loss Epoch Step Validation Loss Model Preparation Time Accuracy
1.4439 1.0 940 0.0090 0.0057 0.9980
0.0052 2.0 1880 0.0035 0.0057 0.9993
0.0031 3.0 2820 0.0016 0.0057 0.9997
0.001 4.0 3760 0.0010 0.0057 0.9997
0.0007 5.0 4700 0.0013 0.0057 0.9997
0.0005 6.0 5640 0.0006 0.0057 0.9997

Framework versions

  • Transformers 4.52.0.dev0
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.1
  • Tokenizers 0.21.1
Downloads last month
7
Safetensors
Model size
92.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VinayHajare/siglip2-finetuned-marathi-sign-language

Finetuned
(90)
this model

Dataset used to train VinayHajare/siglip2-finetuned-marathi-sign-language