Whisper Small - Hindi Automatic Speech Recognition Model
Model Details
Model Description
This is a fine-tuned Whisper Small model for Automatic Speech Recognition (ASR) in Hindi, developed using Parameter-Efficient Fine-Tuning (PEFT) with Low-Rank Adaptation (LoRA). The model is designed to transcribe Hindi speech with improved accuracy and efficiency.
- Developed by: martin-mwiti
- Model type: Automatic Speech Recognition (ASR)
- Language(s): Hindi
- License: Apache-2.0
- Finetuned from model: openai/whisper-small
Model Sources
- Repository: GitHub/martin-mwiti/AI-Model-Hub/ASR
- HuggingFace Hub: martin-mwiti/whisper-small-hi-lora-r32-alpha64-20241231
Uses
Direct Use
This model can be used for transcribing Hindi speech audio files. It is optimized for automatic speech recognition tasks using the Whisper Small model as a base.
Downstream Use
The model can be further fine-tuned or used as a starting point for other Hindi speech recognition applications.
Out-of-Scope Use
- Do not use for languages other than Hindi
- Not suitable for real-time streaming audio transcription
- Avoid using in high-stakes or safety-critical applications without additional validation
Bias, Risks, and Limitations
- Performance may vary depending on audio quality, accent, and background noise
- Trained on Common Voice dataset, which may not represent all Hindi dialects and speaking styles
- May have biases present in the training data
Recommendations
- Validate model performance on your specific use case
- Use in conjunction with human review for critical applications
- Be aware of potential cultural or linguistic biases
How to Get Started with the Model
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel, PeftConfig
# Load the processor from the base model
processor = WhisperProcessor.from_pretrained("openai/whisper-small")
# Load the base Whisper model
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
# Load the adapter configuration and model
adapter_config = PeftConfig.from_pretrained("martin-mwiti/whisper-small-hi-lora-r32-alpha64-20241231")
model = PeftModel.from_pretrained(base_model, adapter_config)
# Use the model for inference
audio_array = ... # Replace with your audio array
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
predicted_ids = model.generate(inputs.input_features)
# Decode the transcription
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print("Transcription:", transcription)
Training Details
Training Data
- Dataset: Common Voice 13.0
- Language: Hindi
- Splits: Trained on combined train and validation sets, tested on test set
Training Procedure
Training Hyperparameters
- Base Model: openai/whisper-small
- Fine-Tuning Method: PEFT with LoRA
- LoRA Configuration:
- Rank (r): 32
- Alpha: 64
- Target Modules: query and value projection matrices
- Dropout: 5%
- Training Regime: Mixed precision (fp16)
- Batch Size: 8 per device
- Learning Rate: 1e-3
- Warmup Steps: 25
- Total Training Steps: 50
Evaluation
Metrics
- Primary Metric: Word Error Rate (WER)
Results
Metric | Value |
---|---|
Average WER | 0.6938 |
Best WER | 0.0000 |
Worst WER | 1.6000 |
- Evaluation Dataset: Common Voice 13.0 Hindi Test Set
- Number of Evaluation Samples: 50
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator.
Citation
If you use this model, please cite the original Whisper paper and acknowledge the fine-tuning work.
BibTeX:
@misc{whisper2022,
title={Robust Speech Recognition via Large-Scale Weak Supervision},
author={Radford, Alec and Kim, Jong Wook and Xu, Tao and et al.},
publisher={arXiv},
year={2022}
}
Model Card Authors
- martin-mwiti
Model Card Contact
For questions or feedback, please open an issue on the GitHub repository or contact the model author.
Model tree for martin-mwiti/whisper-small-hi-lora-r32-alpha64-20241231
Base model
openai/whisper-small