Whisper Medium Mn - Erkhembayar Gantulga

This model is a fine-tuned version of openai/whisper-medium on the Common Voice 17.0 and Google Fleurs datasets. It achieves the following results on the evaluation set:

Loss: 0.1083
Wer: 12.9580

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

Datasets used for training:

For training, combined Common Voice 17.0 and Google Fleurs datasets:

from datasets import load_dataset, DatasetDict, concatenate_datasets
from datasets import Audio

common_voice = DatasetDict()

common_voice["train"] = load_dataset("mozilla-foundation/common_voice_17_0", "mn", split="train+validation+validated", use_auth_token=True)
common_voice["test"] = load_dataset("mozilla-foundation/common_voice_17_0", "mn", split="test", use_auth_token=True)

common_voice = common_voice.cast_column("audio", Audio(sampling_rate=16000))

common_voice = common_voice.remove_columns(
    ["accent", "age", "client_id", "down_votes", "gender", "locale", "path", "segment", "up_votes", "variant"]
)

google_fleurs = DatasetDict()

google_fleurs["train"] = load_dataset("google/fleurs", "mn_mn", split="train+validation", use_auth_token=True)
google_fleurs["test"] = load_dataset("google/fleurs", "mn_mn", split="test", use_auth_token=True)

google_fleurs = google_fleurs.remove_columns(
    ["id", "num_samples", "path", "raw_transcription", "gender", "lang_id", "language", "lang_group_id"]
)
google_fleurs = google_fleurs.rename_column("transcription", "sentence")

dataset = DatasetDict()
dataset["train"] = concatenate_datasets([common_voice["train"], google_fleurs["train"]])
dataset["test"] = concatenate_datasets([common_voice["test"], google_fleurs["test"]])

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
training_steps: 4000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.2986	0.4912	500	0.3557	40.1515
0.2012	0.9823	1000	0.2310	28.3512
0.099	1.4735	1500	0.1864	23.4453
0.0733	1.9646	2000	0.1405	18.3024
0.0231	2.4558	2500	0.1308	16.5645
0.0191	2.9470	3000	0.1155	14.5569
0.0059	3.4381	3500	0.1122	13.4728
0.006	3.9293	4000	0.1083	12.9580

Framework versions

Transformers 4.44.0
Pytorch 2.3.1+cu121
Datasets 2.21.0
Tokenizers 0.19.1

erkhem-gantulga
/

whisper-medium-mn

Whisper Medium Mn - Erkhembayar Gantulga

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for erkhem-gantulga/whisper-medium-mn

Datasets used to train erkhem-gantulga/whisper-medium-mn

Space using erkhem-gantulga/whisper-medium-mn 1

Evaluation results