Whisper Mongolian ASR Model

This is a custom-trained Whisper model for Mongolian speech recognition, based on a custom implementation of Whisper.

Model Details

Architecture: Custom Whisper-like model trained from scratch
Training Data: Mozilla Common Voice Mongolian dataset
Performance Metrics:
- Word Error Rate (WER): 0.9277985118418891
- Character Error Rate (CER): 0.7262371117301725

Usage

This model can be used in two ways:

1. Using the compatibility wrapper:

from transformers import pipeline
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
transcriber = pipeline("automatic-speech-recognition", 
                       model="Nasanbuyan/whisper-mongolian", 
                       device=device)

# Transcribe audio
result = transcriber("path/to/audio.mp3")
print(result["text"])

2. Using the original implementation:

import torch
from whisper-mongolian.whisper_model import WhisperModel

# Load the model
model = WhisperModel("Nasanbuyan/whisper-mongolian", device="cpu")

# Transcribe audio
segments, info = model.transcribe("path/to/audio.mp3")
transcription = " ".join([segment.text for segment in segments])
print(transcription)

Citation

If you use this model, please cite:

@misc{whisper-mongolian,
  author = {Your Name},
  title = {Whisper Mongolian ASR Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Nasanbuyan/whisper-mongolian}}
}