File size: 7,044 Bytes

---
library_name: transformers
license: apache-2.0
base_model: google/siglip2-base-patch16-224
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: siglip2-finetuned-marathi-sign-language
  results: []
datasets:
- VinayHajare/Marathi-Sign-Language
language:
- mr
pipeline_tag: image-classification
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# siglip2-finetuned-marathi-sign-language

This model is a fine-tuned version of [google/siglip2-base-patch16-224](https://huggingface.co/google/siglip2-base-patch16-224) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0006
- Model Preparation Time: 0.0057
- Accuracy: 0.9997

## Model description

Marathi-Sign-Language-Detection is a vision-language model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to recognize Marathi sign language hand gestures and map them to corresponding Devanagari characters using the SiglipForImageClassification architecture.

## Training and evaluation data

```java
Classification Report:
                precision    recall  f1-score   support

           अ     1.0000    1.0000    1.0000       404
           आ     1.0000    1.0000    1.0000       409
           इ     1.0000    1.0000    1.0000       440
           ई     0.9866    1.0000    0.9932       441
           उ     1.0000    1.0000    1.0000       479
           ऊ     1.0000    1.0000    1.0000       428
           ए     1.0000    1.0000    1.0000       457
           ऐ     1.0000    1.0000    1.0000       436
           ओ     1.0000    1.0000    1.0000       430
           औ     1.0000    1.0000    1.0000       408
           क     1.0000    1.0000    1.0000       433
           क्ष     1.0000    1.0000    1.0000       480
           ख     1.0000    1.0000    1.0000       456
           ग     1.0000    1.0000    1.0000       444
           घ     1.0000    1.0000    1.0000       480
           च     1.0000    1.0000    1.0000       463
           छ     1.0000    1.0000    1.0000       468
           ज     1.0000    1.0000    1.0000       480
           ज्ञ     1.0000    1.0000    1.0000       480
           झ     1.0000    1.0000    1.0000       480
           ट     1.0000    1.0000    1.0000       480
           ठ     1.0000    1.0000    1.0000       480
           ड     1.0000    1.0000    1.0000       480
           ढ     1.0000    1.0000    1.0000       480
           ण     1.0000    1.0000    1.0000       480
           त     1.0000    1.0000    1.0000       480
           थ     1.0000    1.0000    1.0000       480
           द     1.0000    0.9875    0.9937       480
           ध     1.0000    1.0000    1.0000       480
           न     1.0000    1.0000    1.0000       480
           प     1.0000    1.0000    1.0000       480
           फ     1.0000    1.0000    1.0000       480
           ब     1.0000    1.0000    1.0000       480
           भ     1.0000    1.0000    1.0000       480
           म     1.0000    1.0000    1.0000       480
           य     1.0000    1.0000    1.0000       480
           र     1.0000    1.0000    1.0000       484
           ल     1.0000    1.0000    1.0000       480
           ळ     1.0000    1.0000    1.0000       480
           व     1.0000    1.0000    1.0000       480
           श     1.0000    1.0000    1.0000       480
           स     1.0000    1.0000    1.0000       480
           ह     1.0000    1.0000    1.0000       480

    accuracy                         0.9997     20040
   macro avg     0.9997    0.9997    0.9997     20040
weighted avg     0.9997    0.9997    0.9997     20040
```
---

## Install Dependencies

```bash
pip install -q transformers torch pillow gradio
```

---

## Inference Code

```python
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch

# Load model and processor
model_name = "VinayHajare/siglip2-finetuned-marathi-sign-language"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

# Marathi label mapping
id2label = {
    "0": "अ", "1": "आ", "2": "इ", "3": "ई", "4": "उ", "5": "ऊ",
    "6": "ए", "7": "ऐ", "8": "ओ", "9": "औ", "10": "क", "11": "क्ष",
    "12": "ख", "13": "ग", "14": "घ", "15": "च", "16": "छ", "17": "ज",
    "18": "ज्ञ", "19": "झ", "20": "ट", "21": "ठ", "22": "ड", "23": "ढ",
    "24": "ण", "25": "त", "26": "थ", "27": "द", "28": "ध", "29": "न",
    "30": "प", "31": "फ", "32": "ब", "33": "भ", "34": "म", "35": "य",
    "36": "र", "37": "ल", "38": "ळ", "39": "व", "40": "श", "41": "स", "42": "ह"
}

def classify_marathi_sign(image):
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

    prediction = {
        id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
    }

    return prediction

# Gradio Interface
iface = gr.Interface(
    fn=classify_marathi_sign,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(num_top_classes=5, label="Marathi Sign Classification"),
    title="Marathi-Sign-Language-Detection",
    description="Upload an image of a Marathi sign language hand gesture to identify the corresponding character."
)

if __name__ == "__main__":
    iface.launch()
```

---
## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-06
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 50
- num_epochs: 6

### Training results

| Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:----------------------:|:--------:|
| 1.4439        | 1.0   | 940  | 0.0090          | 0.0057                 | 0.9980   |
| 0.0052        | 2.0   | 1880 | 0.0035          | 0.0057                 | 0.9993   |
| 0.0031        | 3.0   | 2820 | 0.0016          | 0.0057                 | 0.9997   |
| 0.001         | 4.0   | 3760 | 0.0010          | 0.0057                 | 0.9997   |
| 0.0007        | 5.0   | 4700 | 0.0013          | 0.0057                 | 0.9997   |
| 0.0005        | 6.0   | 5640 | 0.0006          | 0.0057                 | 0.9997   |


### Framework versions

- Transformers 4.52.0.dev0
- Pytorch 2.6.0+cu124
- Datasets 3.5.1
- Tokenizers 0.21.1