Update README.md

84be6a6 verified 17 days ago

7.04 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: google/siglip2-base-patch16-224
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: siglip2-finetuned-marathi-sign-language
	results: []
	datasets:
	- VinayHajare/Marathi-Sign-Language
	language:
	- mr
	pipeline_tag: image-classification
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# siglip2-finetuned-marathi-sign-language

	This model is a fine-tuned version of [google/siglip2-base-patch16-224](https://huggingface.co/google/siglip2-base-patch16-224) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0006
	- Model Preparation Time: 0.0057
	- Accuracy: 0.9997

	## Model description

	Marathi-Sign-Language-Detection is a vision-language model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to recognize Marathi sign language hand gestures and map them to corresponding Devanagari characters using the SiglipForImageClassification architecture.

	## Training and evaluation data

	```java
	Classification Report:
	precision recall f1-score support

	अ 1.0000 1.0000 1.0000 404
	आ 1.0000 1.0000 1.0000 409
	इ 1.0000 1.0000 1.0000 440
	ई 0.9866 1.0000 0.9932 441
	उ 1.0000 1.0000 1.0000 479
	ऊ 1.0000 1.0000 1.0000 428
	ए 1.0000 1.0000 1.0000 457
	ऐ 1.0000 1.0000 1.0000 436
	ओ 1.0000 1.0000 1.0000 430
	औ 1.0000 1.0000 1.0000 408
	क 1.0000 1.0000 1.0000 433
	क्ष 1.0000 1.0000 1.0000 480
	ख 1.0000 1.0000 1.0000 456
	ग 1.0000 1.0000 1.0000 444
	घ 1.0000 1.0000 1.0000 480
	च 1.0000 1.0000 1.0000 463
	छ 1.0000 1.0000 1.0000 468
	ज 1.0000 1.0000 1.0000 480
	ज्ञ 1.0000 1.0000 1.0000 480
	झ 1.0000 1.0000 1.0000 480
	ट 1.0000 1.0000 1.0000 480
	ठ 1.0000 1.0000 1.0000 480
	ड 1.0000 1.0000 1.0000 480
	ढ 1.0000 1.0000 1.0000 480
	ण 1.0000 1.0000 1.0000 480
	त 1.0000 1.0000 1.0000 480
	थ 1.0000 1.0000 1.0000 480
	द 1.0000 0.9875 0.9937 480
	ध 1.0000 1.0000 1.0000 480
	न 1.0000 1.0000 1.0000 480
	प 1.0000 1.0000 1.0000 480
	फ 1.0000 1.0000 1.0000 480
	ब 1.0000 1.0000 1.0000 480
	भ 1.0000 1.0000 1.0000 480
	म 1.0000 1.0000 1.0000 480
	य 1.0000 1.0000 1.0000 480
	र 1.0000 1.0000 1.0000 484
	ल 1.0000 1.0000 1.0000 480
	ळ 1.0000 1.0000 1.0000 480
	व 1.0000 1.0000 1.0000 480
	श 1.0000 1.0000 1.0000 480
	स 1.0000 1.0000 1.0000 480
	ह 1.0000 1.0000 1.0000 480

	accuracy 0.9997 20040
	macro avg 0.9997 0.9997 0.9997 20040
	weighted avg 0.9997 0.9997 0.9997 20040
	```
	---

	## Install Dependencies

	```bash
	pip install -q transformers torch pillow gradio
	```

	---

	## Inference Code

	```python
	import gradio as gr
	from transformers import AutoImageProcessor, SiglipForImageClassification
	from PIL import Image
	import torch

	# Load model and processor
	model_name = "VinayHajare/siglip2-finetuned-marathi-sign-language"
	model = SiglipForImageClassification.from_pretrained(model_name)
	processor = AutoImageProcessor.from_pretrained(model_name)

	# Marathi label mapping
	id2label = {
	"0": "अ", "1": "आ", "2": "इ", "3": "ई", "4": "उ", "5": "ऊ",
	"6": "ए", "7": "ऐ", "8": "ओ", "9": "औ", "10": "क", "11": "क्ष",
	"12": "ख", "13": "ग", "14": "घ", "15": "च", "16": "छ", "17": "ज",
	"18": "ज्ञ", "19": "झ", "20": "ट", "21": "ठ", "22": "ड", "23": "ढ",
	"24": "ण", "25": "त", "26": "थ", "27": "द", "28": "ध", "29": "न",
	"30": "प", "31": "फ", "32": "ब", "33": "भ", "34": "म", "35": "य",
	"36": "र", "37": "ल", "38": "ळ", "39": "व", "40": "श", "41": "स", "42": "ह"
	}

	def classify_marathi_sign(image):
	image = Image.fromarray(image).convert("RGB")
	inputs = processor(images=image, return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits
	probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

	prediction = {
	id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
	}

	return prediction

	# Gradio Interface
	iface = gr.Interface(
	fn=classify_marathi_sign,
	inputs=gr.Image(type="numpy"),
	outputs=gr.Label(num_top_classes=5, label="Marathi Sign Classification"),
	title="Marathi-Sign-Language-Detection",
	description="Upload an image of a Marathi sign language hand gesture to identify the corresponding character."
	)

	if __name__ == "__main__":
	iface.launch()
	```

	---
	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-06
	- train_batch_size: 32
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 50
	- num_epochs: 6

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Model Preparation Time \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:----------------------:\|:--------:\|
	\| 1.4439 \| 1.0 \| 940 \| 0.0090 \| 0.0057 \| 0.9980 \|
	\| 0.0052 \| 2.0 \| 1880 \| 0.0035 \| 0.0057 \| 0.9993 \|
	\| 0.0031 \| 3.0 \| 2820 \| 0.0016 \| 0.0057 \| 0.9997 \|
	\| 0.001 \| 4.0 \| 3760 \| 0.0010 \| 0.0057 \| 0.9997 \|
	\| 0.0007 \| 5.0 \| 4700 \| 0.0013 \| 0.0057 \| 0.9997 \|
	\| 0.0005 \| 6.0 \| 5640 \| 0.0006 \| 0.0057 \| 0.9997 \|


	### Framework versions

	- Transformers 4.52.0.dev0
	- Pytorch 2.6.0+cu124
	- Datasets 3.5.1
	- Tokenizers 0.21.1