Luganda Whisper ASR with a Language Model
This is a fine-tuned Whisper-small model for Luganda ASR using Common Voice and FLEURS datasets, enhanced with a 5-gram KenLM language model for improved transcription quality.
⚠️ Note: OpenAI's Whisper does not officially support
"lg"
(Luganda) as a recognized language code. To bypass this tokenizer restriction, we use"sw"
(Swahili) as a placeholder. This workaround does not affect the model's ability to transcribe Luganda since the model and language model are both fine-tuned specifically for Luganda — it's only needed to satisfy Whisper’s internal tokenizer constraints.
Usage (with whisper-lm)
git clone https://huggingface.co/sulaimank/whisper-small-lg-lm
cd whisper-small-lg-lm
pip install -r requirements.txt
import whisper
from whisper_decoder_with_lm import LMOptions
model_path = "whisper-small-lg.pt"
lm_path = "5gram.bin"
# Set LM parameters
# Optimized alpha and beta for Luganda selected based on which values minimize WER on a subset of your dataset (here: 2,000 samples from Common Voice were used).
LMOptions().lm_path = lm_path
LMOptions().lm_alpha = 0.0211
LMOptions().lm_beta = 0.0119
# Whisper decode options
decode_options = {
"language": "sw", # use Swahili tokenizer as a workaround for Luganda
"without_timestamps": True,
"temperature": 0.0,
"beam_size": 5,
}
# Transcribe audio
model = whisper.load_model(model_path)
result = model.transcribe("your_audio.wav", **decode_options)
print("Transcription:", result["text"])
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support