whisper-small-hy-2 / README.md
ErikMkrtchyan's picture
Adding ONNX file of this model (#1)
4114976 verified
metadata
library_name: transformers
language:
  - hy
license: apache-2.0
base_model: openai/whisper-small
tags:
  - generated_from_trainer
  - onnx
datasets:
  - ErikMkrtchyan/Hy-Generated-audio-data-with-cv20.0
  - ErikMkrtchyan/Hy-Generated-audio-data-2
metrics:
  - wer
model-index:
  - name: Whisper Small Hy 2 - Erik Mkrtchyan
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: Hy Generated Audio Data with CV 20.0
          type: ErikMkrtchyan/Hy-Generated-audio-data-with-cv20.0
          args: 'split: generated+train, eval_split: eval+test'
        metrics:
          - type: wer
            value: 22.785422785422785
            name: Wer

Whisper Small Hy 2 - Erik Mkrtchyan

This model is a fine-tuned version of openai/whisper-small on the Hy Generated Audio Data dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0999
  • Wer: 22.7854

Model description

This model is based on OpenAI's Whisper Small and fine-tuned for Armenian using a combination of real and synthetic audio data. It is designed to transcribe Armenian speech into text.

Intended uses & limitations

Intended Uses:

  • Armenian speech-to-text applications
  • Research on ASR for low-resource languages
  • Educational and experimental projects involving Whisper models

Limitations:

  • May not generalize well to accents or noisy audio not represented in the training set
  • he model may hallucinate text or produce inaccurate transcriptions, especially on unusual or out-of-distribution inputs, due to the inclusion of TTS-generated synthetic data in training.

Training and evaluation data

The dataset contains both real and high-quality synthetic Armenian speech clips.

Split(1) # Clips Duration (hours)
train 9,300 13.53
test 5,818 9.16
eval 5,856 8.76
generated 100,000 113.61
generated[2] 137,419 173.76

Total duration: ~318 hours
Train set duration(train+generated#1+generated#2: ~300 hours
Test set duration(test+eval) ~18 hours

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.0516 0.4999 7709 0.1417 33.0858
0.0366 0.9999 15418 0.1139 27.4340
0.0275 1.4998 23127 0.1057 25.0415
0.0308 1.9997 30836 0.0981 23.7545
0.017 2.4997 38545 0.1016 23.2408
0.019 2.9996 46254 0.0999 22.7854

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.7.0+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1