Whisper Small - Hindi Automatic Speech Recognition Model

Model Details

Model Description

This is a fine-tuned Whisper Small model for Automatic Speech Recognition (ASR) in Hindi, developed using Parameter-Efficient Fine-Tuning (PEFT) with Low-Rank Adaptation (LoRA). The model is designed to transcribe Hindi speech with improved accuracy and efficiency.

Developed by: martin-mwiti
Model type: Automatic Speech Recognition (ASR)
Language(s): Hindi
License: Apache-2.0
Finetuned from model: openai/whisper-small

Model Sources

Repository: GitHub/martin-mwiti/AI-Model-Hub/ASR
HuggingFace Hub: martin-mwiti/whisper-small-hi-lora-r32-alpha64-20241231

Uses

Direct Use

This model can be used for transcribing Hindi speech audio files. It is optimized for automatic speech recognition tasks using the Whisper Small model as a base.

Downstream Use

The model can be further fine-tuned or used as a starting point for other Hindi speech recognition applications.

Out-of-Scope Use

Do not use for languages other than Hindi
Not suitable for real-time streaming audio transcription
Avoid using in high-stakes or safety-critical applications without additional validation

Bias, Risks, and Limitations

Performance may vary depending on audio quality, accent, and background noise
Trained on Common Voice dataset, which may not represent all Hindi dialects and speaking styles
May have biases present in the training data

Recommendations

Validate model performance on your specific use case
Use in conjunction with human review for critical applications
Be aware of potential cultural or linguistic biases

How to Get Started with the Model

from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel, PeftConfig

# Load the processor from the base model
processor = WhisperProcessor.from_pretrained("openai/whisper-small")

# Load the base Whisper model
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")

# Load the adapter configuration and model
adapter_config = PeftConfig.from_pretrained("martin-mwiti/whisper-small-hi-lora-r32-alpha64-20241231")
model = PeftModel.from_pretrained(base_model, adapter_config)

# Use the model for inference
audio_array = ...  # Replace with your audio array
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
predicted_ids = model.generate(inputs.input_features)

# Decode the transcription
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print("Transcription:", transcription)

Training Details

Training Data

Dataset: Common Voice 13.0
Language: Hindi
Splits: Trained on combined train and validation sets, tested on test set

Training Procedure

Training Hyperparameters

Base Model: openai/whisper-small
Fine-Tuning Method: PEFT with LoRA
LoRA Configuration:
- Rank (r): 32
- Alpha: 64
- Target Modules: query and value projection matrices
- Dropout: 5%
Training Regime: Mixed precision (fp16)
Batch Size: 8 per device
Learning Rate: 1e-3
Warmup Steps: 25
Total Training Steps: 50

Evaluation

Metrics

Primary Metric: Word Error Rate (WER)

Results

Metric	Value
Average WER	0.6938
Best WER	0.0000
Worst WER	1.6000

Evaluation Dataset: Common Voice 13.0 Hindi Test Set
Number of Evaluation Samples: 50

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator.

Citation

If you use this model, please cite the original Whisper paper and acknowledge the fine-tuning work.

BibTeX:

@misc{whisper2022,
  title={Robust Speech Recognition via Large-Scale Weak Supervision},
  author={Radford, Alec and Kim, Jong Wook and Xu, Tao and et al.},
  publisher={arXiv},
  year={2022}
}

Model Card Authors

martin-mwiti

Model Card Contact

For questions or feedback, please open an issue on the GitHub repository or contact the model author.

martin-mwiti
/

whisper-small-hi-lora-r32-alpha64-20241231