---
library_name: transformers
tags:
- whisper
- hindi
- asr
- peft
- lora
license: apache-2.0
datasets:
- mozilla-foundation/common_voice_13_0
language:
- hi
metrics:
- wer
base_model:
- openai/whisper-small
pipeline_tag: automatic-speech-recognition
---

# Whisper Small - Hindi Automatic Speech Recognition Model

## Model Details

### Model Description

This is a fine-tuned Whisper Small model for Automatic Speech Recognition (ASR) in Hindi, developed using Parameter-Efficient Fine-Tuning (PEFT) with Low-Rank Adaptation (LoRA). The model is designed to transcribe Hindi speech with improved accuracy and efficiency.

- **Developed by:** martin-mwiti
- **Model type:** Automatic Speech Recognition (ASR)
- **Language(s):** Hindi
- **License:** Apache-2.0
- **Finetuned from model:** openai/whisper-small

### Model Sources

- **Repository:** [GitHub/martin-mwiti/AI-Model-Hub/ASR](https://github.com/MartinMwiti/AI-Model-Hub/ASR)
- **HuggingFace Hub:** [martin-mwiti/whisper-small-hi-lora-r32-alpha64-20241231](https://huggingface.co/martin-mwiti/whisper-small-hi-lora-r32-alpha64-20241231)

## Uses

### Direct Use

This model can be used for transcribing Hindi speech audio files. It is optimized for automatic speech recognition tasks using the Whisper Small model as a base.

### Downstream Use

The model can be further fine-tuned or used as a starting point for other Hindi speech recognition applications.

### Out-of-Scope Use

- Do not use for languages other than Hindi
- Not suitable for real-time streaming audio transcription
- Avoid using in high-stakes or safety-critical applications without additional validation

## Bias, Risks, and Limitations

- Performance may vary depending on audio quality, accent, and background noise
- Trained on Common Voice dataset, which may not represent all Hindi dialects and speaking styles
- May have biases present in the training data

### Recommendations

- Validate model performance on your specific use case
- Use in conjunction with human review for critical applications
- Be aware of potential cultural or linguistic biases

## How to Get Started with the Model

```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel, PeftConfig

# Load the processor from the base model
processor = WhisperProcessor.from_pretrained("openai/whisper-small")

# Load the base Whisper model
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")

# Load the adapter configuration and model
adapter_config = PeftConfig.from_pretrained("martin-mwiti/whisper-small-hi-lora-r32-alpha64-20241231")
model = PeftModel.from_pretrained(base_model, adapter_config)

# Use the model for inference
audio_array = ...  # Replace with your audio array
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
predicted_ids = model.generate(inputs.input_features)

# Decode the transcription
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print("Transcription:", transcription)

```

## Training Details

### Training Data

- **Dataset:** Common Voice 13.0
- **Language:** Hindi
- **Splits:** Trained on combined train and validation sets, tested on test set

### Training Procedure

#### Training Hyperparameters

- **Base Model:** openai/whisper-small
- **Fine-Tuning Method:** PEFT with LoRA
- **LoRA Configuration:**
  - Rank (r): 32
  - Alpha: 64
  - Target Modules: query and value projection matrices
  - Dropout: 5%
- **Training Regime:** Mixed precision (fp16)
- **Batch Size:** 8 per device
- **Learning Rate:** 1e-3
- **Warmup Steps:** 25
- **Total Training Steps:** 50

## Evaluation

### Metrics

- **Primary Metric:** Word Error Rate (WER)

### Results

| Metric        | Value   |
|---------------|---------|
| **Average WER** | 0.6938 |
| **Best WER**    | 0.0000 |
| **Worst WER**   | 1.6000 |

- **Evaluation Dataset:** Common Voice 13.0 Hindi Test Set
- **Number of Evaluation Samples:** 50

## Environmental Impact

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute).

## Citation

If you use this model, please cite the original Whisper paper and acknowledge the fine-tuning work.

**BibTeX:**
```bibtex
@misc{whisper2022,
  title={Robust Speech Recognition via Large-Scale Weak Supervision},
  author={Radford, Alec and Kim, Jong Wook and Xu, Tao and et al.},
  publisher={arXiv},
  year={2022}
}
```

## Model Card Authors

- martin-mwiti

## Model Card Contact

For questions or feedback, please open an issue on the GitHub repository or contact the model author.