Medibeng Whisper Tiny
Model Description
Medibeng Whisper Tiny is a fine-tuned version of the Whisper model for automatic speech recognition (ASR), specifically designed to transcribe and translate code-switched Bengali-English conversations into English. This model is designed for clinical settings and can handle audio that contains a mix of Bengali and English, making it suitable for transcription and translation tasks in multilingual environments, such as medical and healthcare settings.
Usage
To use the Medibeng Whisper Tiny model for translating code-switched Bengali-English conversations into English, follow this example:
Please install the package first:
pip install pandas transformers librosa
Run this code:
import os
import pandas as pd
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa
# Set the model path and language/task
model_path = "pr0mila-gh0sh/MediBeng-Whisper-Tiny"
LANGUAGE = "en" # Target language for translation
TASK = "translate" # Translation task
# Load model and processor from the specified path
processor = WhisperProcessor.from_pretrained(model_path)
model = WhisperForConditionalGeneration.from_pretrained(model_path)
# Get forced decoder IDs for translation task to English
forced_decoder_ids = processor.get_decoder_prompt_ids(language=LANGUAGE, task=TASK)
# Path to your single audio file
audio_file_path = "path_to_audio.wav"
# Load and preprocess the audio file using librosa
audio_input, _ = librosa.load(audio_file_path, sr=16000)
# Process the audio sample into input features for the Whisper model
input_features = processor(audio_input, sampling_rate=16000, return_tensors="pt").input_features
# Generate token ids for the transcription/translation
predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids)
# Decode token ids to text (translation)
translation = processor.batch_decode(predicted_ids, skip_special_tokens=True)
# Output the transcription/translation result
print("Translation:", translation[0])
Key Features:
- Speech-to-text: Converts code-mixed Bengali-English audio to English text.
- Clinical Setting: Fine-tuned on a medical dataset containing clinical conversations, enabling it to handle healthcare-specific terminology.
- Code-mixed Speech: Designed to handle code-switching between Bengali and English, which is common in multilingual regions.
Intended Use
This model is intended for use by researchers and developers working with code-mixed Bengali-English audio in the clinical domain. It is suitable for:
- Medical transcription services where conversations involve both Bengali and English.
- Voice assistants in healthcare, assisting healthcare providers in multilingual settings.
- Speech-to-text applications in healthcare environments, particularly for doctors and patients speaking a mix of Bengali and English.
The model works best in environments where both Bengali and English are used interchangeably, particularly in healthcare or clinical scenarios.
Training Data
The model was fine-tuned on the MediBeng dataset, which consists of code-switched Bengali-English conversations in clinical settings.
- Dataset Size: 20% of the MediBeng dataset was used for fine-tuning. The dataset is available on Hugging Face.
- Data Source: MediBeng dataset
- Data Process Source: ParquetToHuggingFace
- Data Characteristics: The dataset contains conversational speech with both Bengali and English, with specific focus on medical terminologies and clinical dialogues.
Evaluation Results
The model's performance improved as the training progressed, showing consistent reduction in training loss and Word Error Rate (WER) on the evaluation set.
Epoch | Training Loss | Training Grad Norm | Learning Rate | Eval Loss | Eval WER |
---|---|---|---|---|---|
0.03 | 2.6213 | 61.56 | 4.80E-06 | - | - |
0.07 | 1.609 | 44.09 | 9.80E-06 | 1.13 | 107.72 |
0.1 | 0.7685 | 52.27 | 9.47E-06 | - | - |
0.13 | 0.4145 | 32.27 | 8.91E-06 | 0.37 | 47.53 |
0.16 | 0.3177 | 17.98 | 8.36E-06 | - | - |
0.2 | 0.222 | 7.7 | 7.80E-06 | 0.1 | 45.19 |
0.23 | 0.0915 | 1.62 | 7.24E-06 | - | - |
0.26 | 0.081 | 0.4 | 6.69E-06 | 0.04 | 38.35 |
0.33 | 0.0246 | 1.01 | 5.58E-06 | - | - |
0.36 | 0.0212 | 2.2 | 5.02E-06 | 0.01 | 41.88 |
0.42 | 0.0052 | 0.13 | 3.91E-06 | - | - |
0.46 | 0.0023 | 0.45 | 3.36E-06 | 0.01 | 34.07 |
0.52 | 0.0013 | 0.05 | 1.69E-06 | - | - |
0.55 | 0.0032 | 0.11 | 1.13E-06 | 0.01 | 29.52 |
0.62 | 0.001 | 0.09 | 5.78E-07 | - | - |
0.65 | 0.0012 | 0.08 | 2.22E-08 | 0 | 30.49 |
- Training Loss: The training loss decreases consistently, indicating the model is learning well.
- Eval Loss: The evaluation loss decreases significantly, showing that the model is generalizing well to unseen data.
- Eval WER: The Word Error Rate (WER) decreases over the epochs, indicating the model is getting better at transcribing code-switched Bengali-English speech.
Limitations
- Accents: The model may struggle with very strong regional accents or non-native speakers of Bengali and English.
- Specialized Terms: The model may not perform well with highly specialized medical terms or out-of-domain speech.
- Multilingual Support: While the model is designed for Bengali and English, other languages are not supported.
Ethical Considerations
- Biases: The training data may contain biases based on the demographics of the speakers, such as gender, age, and accent.
- Misuse: Like any ASR system, this model could be misused to create fake transcripts of audio recordings, potentially leading to privacy and security concerns.
- Fairness: Ensure the model is used in contexts where fairness and ethical considerations are taken into account, particularly in clinical environments.
Blog Post
I’ve written a detailed blog post on Medium about MediBeng Whisper-Tiny and how it translates code-switched Bengali-English speech in healthcare. In this post, I discuss the dataset creation, model fine-tuning, and how this can improve healthcare transcription. Read the full article here: MediBeng Whisper-Tiny: Translating Code-Switched Bengali-English Speech for Healthcare
Citation for Research Use
If you use Medibeng Whisper-Tiny or the MediBeng dataset for your research or project, please cite the following:
For Medibeng Whisper-Tiny Model (Fine-Tuned Model):
@misc{pr0mila2025medibengwhisper,
author = {Promila Ghosh},
title = {Medibeng Whisper-Tiny: Code-Switched Bengali-English Speech Translation for Clinical Settings},
year = {2025},
howpublished = {\url{https://huggingface.co/pr0mila-gh0sh/MediBeng-Whisper-Tiny}},
}
- Downloads last month
- 47
Model tree for pr0mila-gh0sh/MediBeng-Whisper-Tiny
Base model
openai/whisper-tiny