Model Card for Odia-English Whisper ASR with LoRA
This model is a fine-tuned version of OpenAI’s whisper-small
for automatic speech recognition (ASR) on the Odia-English bilingual dataset. Fine-tuning was done using LoRA (Low-Rank Adaptation) for parameter-efficient training. The model supports transcribing speech in Odia (Bengali script) with Whisper tokenizer and feature extractor.
Model Details
Developed by: Dr. Balyogi Mohan Dash
Model type: Whisper (sequence-to-sequence transformer for ASR)
Language(s): Odia (written in Bengali script), English
License: apache-2.0
Fine-tuned from model: openai/whisper-small
Model Sources
Training Code: Private repo / project (not shared in this card) Dataset: Mohan-diffuser/odia-english-ASR
Uses
Direct Use
- Automatic transcription of Odia-English speech recordings
- Educational or accessibility tools for low-resource language ASR
- Dataset bootstrapping for speech corpora in Indian languages
How to Get Started
import torch
from datasets import load_dataset
from transformers import WhisperTokenizer
from transformers import WhisperFeatureExtractor
from transformers import WhisperForConditionalGeneration
from peft import LoraConfig, PeftModel, LoraModel, LoraConfig, get_peft_model
from scipy.signal import resample
def down_sample_audio(audio_original, original_sample_rate):
target_sample_rate = 16000
# Calculate the number of samples for the target sample rate
num_samples = int(len(audio_original) * target_sample_rate / original_sample_rate)
# Resample the audio array to the target sample rate
downsampled_audio = resample(audio_original, num_samples)
return downsampled_audio
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-small",language='bengali',task='transcribe')
feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-small",language='bengali',task='transcribe')
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small").to('cuda')
model = PeftModel.from_pretrained(model, "Mohan-diffuser/whisper-small-odia-finetuned", is_trainable=False, device_map='cuda')
model.eval()
model.config.use_cache = True
asr_dataset = load_dataset("Mohan-diffuser/odia-english-ASR")
idx=0
target = asr_dataset['validation'][idx]['transcription']
audio_original = asr_dataset['validation'][idx]['audio']['array']
original_sample_rate = asr_dataset['validation'][idx]['audio']['sampling_rate']
audio_16000 = down_sample_audio(audio_original, original_sample_rate)
input_feature = feature_extractor(raw_speech=audio_16000,
sampling_rate=16000,
return_tensors='pt').input_features
with torch.no_grad():
op = model.generate(input_feature.to('cuda'), language='bengali', task='transcribe')
text_pred = tokenizer.batch_decode(op, skip_special_tokens=True)[0]
Training Details
Training Data
- Dataset: Mohan-diffuser/odia-english-ASR
- Audio: Native Odia and code-mixed English speech with manual transcriptions.
- Sampling rate: Downsampled to 16kHz using
scipy.signal.resample
.
Training Procedure
- LoRA Parameters: $r=64$, $lora_alpha=64$, $dropout=0.05$
- Scheduler: Linear warmup
- Warmup steps: 20
- Max steps: 1400
- Batch size: 8
- Gradient accumulation: 4
- Optimizer: AdamW on trainable LoRA parameters
- Eval steps: 100
Evaluation
Dataset
- Validation split from Mohan-diffuser/odia-english-ASR
Metrics
- CER (Character Error Rate): Computed using
jiwer.cer
- It acheived a CER of 14.14 in the Valdidation dataset
- Manual predictions logged every 100 steps for qualitative monitoring.
Environmental Impact
- Hardware Used: Single GPU (4060 TI)
- Training Duration: ~1400 steps, small-scale LoRA tuning
- Framework: PyTorch, Transformers, PEFT
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Mohan-diffuser/whisper-small-odia-finetuned
Base model
openai/whisper-small