--- library_name: transformers tags: - whisper - hindi - asr - peft - lora license: apache-2.0 datasets: - mozilla-foundation/common_voice_13_0 language: - hi metrics: - wer base_model: - openai/whisper-small pipeline_tag: automatic-speech-recognition --- # Whisper Small - Hindi Automatic Speech Recognition Model ## Model Details ### Model Description This is a fine-tuned Whisper Small model for Automatic Speech Recognition (ASR) in Hindi, developed using Parameter-Efficient Fine-Tuning (PEFT) with Low-Rank Adaptation (LoRA). The model is designed to transcribe Hindi speech with improved accuracy and efficiency. - **Developed by:** martin-mwiti - **Model type:** Automatic Speech Recognition (ASR) - **Language(s):** Hindi - **License:** Apache-2.0 - **Finetuned from model:** openai/whisper-small ### Model Sources - **Repository:** [GitHub/martin-mwiti/AI-Model-Hub/ASR](https://github.com/MartinMwiti/AI-Model-Hub/ASR) - **HuggingFace Hub:** [martin-mwiti/whisper-small-hi-lora-r32-alpha64-20241231](https://huggingface.co/martin-mwiti/whisper-small-hi-lora-r32-alpha64-20241231) ## Uses ### Direct Use This model can be used for transcribing Hindi speech audio files. It is optimized for automatic speech recognition tasks using the Whisper Small model as a base. ### Downstream Use The model can be further fine-tuned or used as a starting point for other Hindi speech recognition applications. ### Out-of-Scope Use - Do not use for languages other than Hindi - Not suitable for real-time streaming audio transcription - Avoid using in high-stakes or safety-critical applications without additional validation ## Bias, Risks, and Limitations - Performance may vary depending on audio quality, accent, and background noise - Trained on Common Voice dataset, which may not represent all Hindi dialects and speaking styles - May have biases present in the training data ### Recommendations - Validate model performance on your specific use case - Use in conjunction with human review for critical applications - Be aware of potential cultural or linguistic biases ## How to Get Started with the Model ```python from transformers import WhisperProcessor, WhisperForConditionalGeneration from peft import PeftModel, PeftConfig # Load the processor from the base model processor = WhisperProcessor.from_pretrained("openai/whisper-small") # Load the base Whisper model base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small") # Load the adapter configuration and model adapter_config = PeftConfig.from_pretrained("martin-mwiti/whisper-small-hi-lora-r32-alpha64-20241231") model = PeftModel.from_pretrained(base_model, adapter_config) # Use the model for inference audio_array = ... # Replace with your audio array inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt") predicted_ids = model.generate(inputs.input_features) # Decode the transcription transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0] print("Transcription:", transcription) ``` ## Training Details ### Training Data - **Dataset:** Common Voice 13.0 - **Language:** Hindi - **Splits:** Trained on combined train and validation sets, tested on test set ### Training Procedure #### Training Hyperparameters - **Base Model:** openai/whisper-small - **Fine-Tuning Method:** PEFT with LoRA - **LoRA Configuration:** - Rank (r): 32 - Alpha: 64 - Target Modules: query and value projection matrices - Dropout: 5% - **Training Regime:** Mixed precision (fp16) - **Batch Size:** 8 per device - **Learning Rate:** 1e-3 - **Warmup Steps:** 25 - **Total Training Steps:** 50 ## Evaluation ### Metrics - **Primary Metric:** Word Error Rate (WER) ### Results | Metric | Value | |---------------|---------| | **Average WER** | 0.6938 | | **Best WER** | 0.0000 | | **Worst WER** | 1.6000 | - **Evaluation Dataset:** Common Voice 13.0 Hindi Test Set - **Number of Evaluation Samples:** 50 ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute). ## Citation If you use this model, please cite the original Whisper paper and acknowledge the fine-tuning work. **BibTeX:** ```bibtex @misc{whisper2022, title={Robust Speech Recognition via Large-Scale Weak Supervision}, author={Radford, Alec and Kim, Jong Wook and Xu, Tao and et al.}, publisher={arXiv}, year={2022} } ``` ## Model Card Authors - martin-mwiti ## Model Card Contact For questions or feedback, please open an issue on the GitHub repository or contact the model author.