--- library_name: transformers license: apache-2.0 base_model: google/siglip2-base-patch16-224 tags: - generated_from_trainer metrics: - accuracy model-index: - name: siglip2-finetuned-marathi-sign-language results: [] datasets: - VinayHajare/Marathi-Sign-Language language: - mr pipeline_tag: image-classification --- # siglip2-finetuned-marathi-sign-language This model is a fine-tuned version of [google/siglip2-base-patch16-224](https://huggingface.co/google/siglip2-base-patch16-224) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.0006 - Model Preparation Time: 0.0057 - Accuracy: 0.9997 ## Model description Marathi-Sign-Language-Detection is a vision-language model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to recognize Marathi sign language hand gestures and map them to corresponding Devanagari characters using the SiglipForImageClassification architecture. ## Training and evaluation data ```java Classification Report: precision recall f1-score support अ 1.0000 1.0000 1.0000 404 आ 1.0000 1.0000 1.0000 409 इ 1.0000 1.0000 1.0000 440 ई 0.9866 1.0000 0.9932 441 उ 1.0000 1.0000 1.0000 479 ऊ 1.0000 1.0000 1.0000 428 ए 1.0000 1.0000 1.0000 457 ऐ 1.0000 1.0000 1.0000 436 ओ 1.0000 1.0000 1.0000 430 औ 1.0000 1.0000 1.0000 408 क 1.0000 1.0000 1.0000 433 क्ष 1.0000 1.0000 1.0000 480 ख 1.0000 1.0000 1.0000 456 ग 1.0000 1.0000 1.0000 444 घ 1.0000 1.0000 1.0000 480 च 1.0000 1.0000 1.0000 463 छ 1.0000 1.0000 1.0000 468 ज 1.0000 1.0000 1.0000 480 ज्ञ 1.0000 1.0000 1.0000 480 झ 1.0000 1.0000 1.0000 480 ट 1.0000 1.0000 1.0000 480 ठ 1.0000 1.0000 1.0000 480 ड 1.0000 1.0000 1.0000 480 ढ 1.0000 1.0000 1.0000 480 ण 1.0000 1.0000 1.0000 480 त 1.0000 1.0000 1.0000 480 थ 1.0000 1.0000 1.0000 480 द 1.0000 0.9875 0.9937 480 ध 1.0000 1.0000 1.0000 480 न 1.0000 1.0000 1.0000 480 प 1.0000 1.0000 1.0000 480 फ 1.0000 1.0000 1.0000 480 ब 1.0000 1.0000 1.0000 480 भ 1.0000 1.0000 1.0000 480 म 1.0000 1.0000 1.0000 480 य 1.0000 1.0000 1.0000 480 र 1.0000 1.0000 1.0000 484 ल 1.0000 1.0000 1.0000 480 ळ 1.0000 1.0000 1.0000 480 व 1.0000 1.0000 1.0000 480 श 1.0000 1.0000 1.0000 480 स 1.0000 1.0000 1.0000 480 ह 1.0000 1.0000 1.0000 480 accuracy 0.9997 20040 macro avg 0.9997 0.9997 0.9997 20040 weighted avg 0.9997 0.9997 0.9997 20040 ``` --- ## Install Dependencies ```bash pip install -q transformers torch pillow gradio ``` --- ## Inference Code ```python import gradio as gr from transformers import AutoImageProcessor, SiglipForImageClassification from PIL import Image import torch # Load model and processor model_name = "VinayHajare/siglip2-finetuned-marathi-sign-language" model = SiglipForImageClassification.from_pretrained(model_name) processor = AutoImageProcessor.from_pretrained(model_name) # Marathi label mapping id2label = { "0": "अ", "1": "आ", "2": "इ", "3": "ई", "4": "उ", "5": "ऊ", "6": "ए", "7": "ऐ", "8": "ओ", "9": "औ", "10": "क", "11": "क्ष", "12": "ख", "13": "ग", "14": "घ", "15": "च", "16": "छ", "17": "ज", "18": "ज्ञ", "19": "झ", "20": "ट", "21": "ठ", "22": "ड", "23": "ढ", "24": "ण", "25": "त", "26": "थ", "27": "द", "28": "ध", "29": "न", "30": "प", "31": "फ", "32": "ब", "33": "भ", "34": "म", "35": "य", "36": "र", "37": "ल", "38": "ळ", "39": "व", "40": "श", "41": "स", "42": "ह" } def classify_marathi_sign(image): image = Image.fromarray(image).convert("RGB") inputs = processor(images=image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist() prediction = { id2label[str(i)]: round(probs[i], 3) for i in range(len(probs)) } return prediction # Gradio Interface iface = gr.Interface( fn=classify_marathi_sign, inputs=gr.Image(type="numpy"), outputs=gr.Label(num_top_classes=5, label="Marathi Sign Classification"), title="Marathi-Sign-Language-Detection", description="Upload an image of a Marathi sign language hand gesture to identify the corresponding character." ) if __name__ == "__main__": iface.launch() ``` --- ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-06 - train_batch_size: 32 - eval_batch_size: 8 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 50 - num_epochs: 6 ### Training results | Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:----------------------:|:--------:| | 1.4439 | 1.0 | 940 | 0.0090 | 0.0057 | 0.9980 | | 0.0052 | 2.0 | 1880 | 0.0035 | 0.0057 | 0.9993 | | 0.0031 | 3.0 | 2820 | 0.0016 | 0.0057 | 0.9997 | | 0.001 | 4.0 | 3760 | 0.0010 | 0.0057 | 0.9997 | | 0.0007 | 5.0 | 4700 | 0.0013 | 0.0057 | 0.9997 | | 0.0005 | 6.0 | 5640 | 0.0006 | 0.0057 | 0.9997 | ### Framework versions - Transformers 4.52.0.dev0 - Pytorch 2.6.0+cu124 - Datasets 3.5.1 - Tokenizers 0.21.1