Whisper Small Hy 2 - Erik Mkrtchyan

This model is a fine-tuned version of openai/whisper-small on the Hy Generated Audio Data dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0999
  • Wer: 22.7854

Model description

This model is based on OpenAI's Whisper Small and fine-tuned for Armenian using a combination of real and synthetic audio data. It is designed to transcribe Armenian speech into text.

Intended uses & limitations

Intended Uses:

  • Armenian speech-to-text applications
  • Research on ASR for low-resource languages
  • Educational and experimental projects involving Whisper models

Limitations:

  • May not generalize well to accents or noisy audio not represented in the training set
  • he model may hallucinate text or produce inaccurate transcriptions, especially on unusual or out-of-distribution inputs, due to the inclusion of TTS-generated synthetic data in training.

Training and evaluation data

The dataset contains both real and high-quality synthetic Armenian speech clips.

Split(1) # Clips Duration (hours)
train 9,300 13.53
test 5,818 9.16
eval 5,856 8.76
generated 100,000 113.61
generated[2] 137,419 173.76

Total duration: ~318 hours
Train set duration(train+generated#1+generated#2: ~300 hours
Test set duration(test+eval) ~18 hours

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.0516 0.4999 7709 0.1417 33.0858
0.0366 0.9999 15418 0.1139 27.4340
0.0275 1.4998 23127 0.1057 25.0415
0.0308 1.9997 30836 0.0981 23.7545
0.017 2.4997 38545 0.1016 23.2408
0.019 2.9996 46254 0.0999 22.7854

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.7.0+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
31
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ErikMkrtchyan/whisper-small-hy-2

Quantized
(26)
this model
Quantizations
1 model

Datasets used to train ErikMkrtchyan/whisper-small-hy-2

Evaluation results