Whisper Small - Hindi Automatic Speech Recognition Model

Model Details

Model Description

This is a fine-tuned Whisper Small model for Automatic Speech Recognition (ASR) in Hindi, developed using Parameter-Efficient Fine-Tuning (PEFT) with Low-Rank Adaptation (LoRA). The model is designed to transcribe Hindi speech with improved accuracy and efficiency.

  • Developed by: martin-mwiti
  • Model type: Automatic Speech Recognition (ASR)
  • Language(s): Hindi
  • License: Apache-2.0
  • Finetuned from model: openai/whisper-small

Model Sources

Uses

Direct Use

This model can be used for transcribing Hindi speech audio files. It is optimized for automatic speech recognition tasks using the Whisper Small model as a base.

Downstream Use

The model can be further fine-tuned or used as a starting point for other Hindi speech recognition applications.

Out-of-Scope Use

  • Do not use for languages other than Hindi
  • Not suitable for real-time streaming audio transcription
  • Avoid using in high-stakes or safety-critical applications without additional validation

Bias, Risks, and Limitations

  • Performance may vary depending on audio quality, accent, and background noise
  • Trained on Common Voice dataset, which may not represent all Hindi dialects and speaking styles
  • May have biases present in the training data

Recommendations

  • Validate model performance on your specific use case
  • Use in conjunction with human review for critical applications
  • Be aware of potential cultural or linguistic biases

How to Get Started with the Model

from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel, PeftConfig

# Load the processor from the base model
processor = WhisperProcessor.from_pretrained("openai/whisper-small")

# Load the base Whisper model
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")

# Load the adapter configuration and model
adapter_config = PeftConfig.from_pretrained("martin-mwiti/whisper-small-hi-lora-r32-alpha64-20241231")
model = PeftModel.from_pretrained(base_model, adapter_config)

# Use the model for inference
audio_array = ...  # Replace with your audio array
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
predicted_ids = model.generate(inputs.input_features)

# Decode the transcription
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print("Transcription:", transcription)

Training Details

Training Data

  • Dataset: Common Voice 13.0
  • Language: Hindi
  • Splits: Trained on combined train and validation sets, tested on test set

Training Procedure

Training Hyperparameters

  • Base Model: openai/whisper-small
  • Fine-Tuning Method: PEFT with LoRA
  • LoRA Configuration:
    • Rank (r): 32
    • Alpha: 64
    • Target Modules: query and value projection matrices
    • Dropout: 5%
  • Training Regime: Mixed precision (fp16)
  • Batch Size: 8 per device
  • Learning Rate: 1e-3
  • Warmup Steps: 25
  • Total Training Steps: 50

Evaluation

Metrics

  • Primary Metric: Word Error Rate (WER)

Results

Metric Value
Average WER 0.6938
Best WER 0.0000
Worst WER 1.6000
  • Evaluation Dataset: Common Voice 13.0 Hindi Test Set
  • Number of Evaluation Samples: 50

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator.

Citation

If you use this model, please cite the original Whisper paper and acknowledge the fine-tuning work.

BibTeX:

@misc{whisper2022,
  title={Robust Speech Recognition via Large-Scale Weak Supervision},
  author={Radford, Alec and Kim, Jong Wook and Xu, Tao and et al.},
  publisher={arXiv},
  year={2022}
}

Model Card Authors

  • martin-mwiti

Model Card Contact

For questions or feedback, please open an issue on the GitHub repository or contact the model author.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for martin-mwiti/whisper-small-hi-lora-r32-alpha64-20241231

Adapter
(130)
this model

Dataset used to train martin-mwiti/whisper-small-hi-lora-r32-alpha64-20241231