Whisper Xhosa ASR Models
This is a collection of Whisper models for transcribing audio/video in the Xhosa language.
4 items
This model is a fine-tuned version of OpenAI's Whisper-small, optimized for isiXhosa Automatic Speech Recognition (ASR). It has been trained on the NCHLT isiXhosa Speech Corpus to improve its performance on isiXhosa speech transcription tasks.
To use this model for inference:
from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch
# Load model and processor
model = WhisperForConditionalGeneration.from_pretrained("TheirStory-Inc/whisper-small-xhosa")
processor = WhisperProcessor.from_pretrained("TheirStory-Inc/whisper-small-xhosa")
# Prepare your audio file (16kHz sampling rate)
audio_input = ... # Load your audio file here
# Process the audio
input_features = processor(audio_input, sampling_rate=16000, return_tensors="pt").input_features
# Generate token ids
predicted_ids = model.generate(input_features)
# Decode the token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
De Vries, N.J., Davel, M.H., Badenhorst, J., Basson, W.D., de Wet, F., Barnard, E. and de Waal, A. (2014). A smartphone-based ASR data collection tool for under-resourced languages. Speech Communication, 56, 119-131. https://hdl.handle.net/20.500.12185/279
Base model