|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
base_model: google/siglip2-base-patch16-224 |
|
tags: |
|
- generated_from_trainer |
|
metrics: |
|
- accuracy |
|
model-index: |
|
- name: siglip2-finetuned-marathi-sign-language |
|
results: [] |
|
datasets: |
|
- VinayHajare/Marathi-Sign-Language |
|
language: |
|
- mr |
|
pipeline_tag: image-classification |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# siglip2-finetuned-marathi-sign-language |
|
|
|
This model is a fine-tuned version of [google/siglip2-base-patch16-224](https://huggingface.co/google/siglip2-base-patch16-224) on an unknown dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.0006 |
|
- Model Preparation Time: 0.0057 |
|
- Accuracy: 0.9997 |
|
|
|
## Model description |
|
|
|
Marathi-Sign-Language-Detection is a vision-language model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to recognize Marathi sign language hand gestures and map them to corresponding Devanagari characters using the SiglipForImageClassification architecture. |
|
|
|
## Training and evaluation data |
|
|
|
```java |
|
Classification Report: |
|
precision recall f1-score support |
|
|
|
अ 1.0000 1.0000 1.0000 404 |
|
आ 1.0000 1.0000 1.0000 409 |
|
इ 1.0000 1.0000 1.0000 440 |
|
ई 0.9866 1.0000 0.9932 441 |
|
उ 1.0000 1.0000 1.0000 479 |
|
ऊ 1.0000 1.0000 1.0000 428 |
|
ए 1.0000 1.0000 1.0000 457 |
|
ऐ 1.0000 1.0000 1.0000 436 |
|
ओ 1.0000 1.0000 1.0000 430 |
|
औ 1.0000 1.0000 1.0000 408 |
|
क 1.0000 1.0000 1.0000 433 |
|
क्ष 1.0000 1.0000 1.0000 480 |
|
ख 1.0000 1.0000 1.0000 456 |
|
ग 1.0000 1.0000 1.0000 444 |
|
घ 1.0000 1.0000 1.0000 480 |
|
च 1.0000 1.0000 1.0000 463 |
|
छ 1.0000 1.0000 1.0000 468 |
|
ज 1.0000 1.0000 1.0000 480 |
|
ज्ञ 1.0000 1.0000 1.0000 480 |
|
झ 1.0000 1.0000 1.0000 480 |
|
ट 1.0000 1.0000 1.0000 480 |
|
ठ 1.0000 1.0000 1.0000 480 |
|
ड 1.0000 1.0000 1.0000 480 |
|
ढ 1.0000 1.0000 1.0000 480 |
|
ण 1.0000 1.0000 1.0000 480 |
|
त 1.0000 1.0000 1.0000 480 |
|
थ 1.0000 1.0000 1.0000 480 |
|
द 1.0000 0.9875 0.9937 480 |
|
ध 1.0000 1.0000 1.0000 480 |
|
न 1.0000 1.0000 1.0000 480 |
|
प 1.0000 1.0000 1.0000 480 |
|
फ 1.0000 1.0000 1.0000 480 |
|
ब 1.0000 1.0000 1.0000 480 |
|
भ 1.0000 1.0000 1.0000 480 |
|
म 1.0000 1.0000 1.0000 480 |
|
य 1.0000 1.0000 1.0000 480 |
|
र 1.0000 1.0000 1.0000 484 |
|
ल 1.0000 1.0000 1.0000 480 |
|
ळ 1.0000 1.0000 1.0000 480 |
|
व 1.0000 1.0000 1.0000 480 |
|
श 1.0000 1.0000 1.0000 480 |
|
स 1.0000 1.0000 1.0000 480 |
|
ह 1.0000 1.0000 1.0000 480 |
|
|
|
accuracy 0.9997 20040 |
|
macro avg 0.9997 0.9997 0.9997 20040 |
|
weighted avg 0.9997 0.9997 0.9997 20040 |
|
``` |
|
--- |
|
|
|
## Install Dependencies |
|
|
|
```bash |
|
pip install -q transformers torch pillow gradio |
|
``` |
|
|
|
--- |
|
|
|
## Inference Code |
|
|
|
```python |
|
import gradio as gr |
|
from transformers import AutoImageProcessor, SiglipForImageClassification |
|
from PIL import Image |
|
import torch |
|
|
|
# Load model and processor |
|
model_name = "VinayHajare/siglip2-finetuned-marathi-sign-language" |
|
model = SiglipForImageClassification.from_pretrained(model_name) |
|
processor = AutoImageProcessor.from_pretrained(model_name) |
|
|
|
# Marathi label mapping |
|
id2label = { |
|
"0": "अ", "1": "आ", "2": "इ", "3": "ई", "4": "उ", "5": "ऊ", |
|
"6": "ए", "7": "ऐ", "8": "ओ", "9": "औ", "10": "क", "11": "क्ष", |
|
"12": "ख", "13": "ग", "14": "घ", "15": "च", "16": "छ", "17": "ज", |
|
"18": "ज्ञ", "19": "झ", "20": "ट", "21": "ठ", "22": "ड", "23": "ढ", |
|
"24": "ण", "25": "त", "26": "थ", "27": "द", "28": "ध", "29": "न", |
|
"30": "प", "31": "फ", "32": "ब", "33": "भ", "34": "म", "35": "य", |
|
"36": "र", "37": "ल", "38": "ळ", "39": "व", "40": "श", "41": "स", "42": "ह" |
|
} |
|
|
|
def classify_marathi_sign(image): |
|
image = Image.fromarray(image).convert("RGB") |
|
inputs = processor(images=image, return_tensors="pt") |
|
|
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
logits = outputs.logits |
|
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist() |
|
|
|
prediction = { |
|
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs)) |
|
} |
|
|
|
return prediction |
|
|
|
# Gradio Interface |
|
iface = gr.Interface( |
|
fn=classify_marathi_sign, |
|
inputs=gr.Image(type="numpy"), |
|
outputs=gr.Label(num_top_classes=5, label="Marathi Sign Classification"), |
|
title="Marathi-Sign-Language-Detection", |
|
description="Upload an image of a Marathi sign language hand gesture to identify the corresponding character." |
|
) |
|
|
|
if __name__ == "__main__": |
|
iface.launch() |
|
``` |
|
|
|
--- |
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 2e-06 |
|
- train_batch_size: 32 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_steps: 50 |
|
- num_epochs: 6 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Accuracy | |
|
|:-------------:|:-----:|:----:|:---------------:|:----------------------:|:--------:| |
|
| 1.4439 | 1.0 | 940 | 0.0090 | 0.0057 | 0.9980 | |
|
| 0.0052 | 2.0 | 1880 | 0.0035 | 0.0057 | 0.9993 | |
|
| 0.0031 | 3.0 | 2820 | 0.0016 | 0.0057 | 0.9997 | |
|
| 0.001 | 4.0 | 3760 | 0.0010 | 0.0057 | 0.9997 | |
|
| 0.0007 | 5.0 | 4700 | 0.0013 | 0.0057 | 0.9997 | |
|
| 0.0005 | 6.0 | 5640 | 0.0006 | 0.0057 | 0.9997 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.52.0.dev0 |
|
- Pytorch 2.6.0+cu124 |
|
- Datasets 3.5.1 |
|
- Tokenizers 0.21.1 |