metadata

library_name: transformers
language:
  - hy
license: apache-2.0
base_model: openai/whisper-small
tags:
  - generated_from_trainer
  - onnx
datasets:
  - ErikMkrtchyan/Hy-Generated-audio-data-with-cv20.0
  - ErikMkrtchyan/Hy-Generated-audio-data-2
metrics:
  - wer
model-index:
  - name: Whisper Small Hy 2 - Erik Mkrtchyan
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: Hy Generated Audio Data with CV 20.0
          type: ErikMkrtchyan/Hy-Generated-audio-data-with-cv20.0
          args: 'split: generated+train, eval_split: eval+test'
        metrics:
          - type: wer
            value: 22.785422785422785
            name: Wer

Whisper Small Hy 2 - Erik Mkrtchyan

This model is a fine-tuned version of openai/whisper-small on the Hy Generated Audio Data dataset. It achieves the following results on the evaluation set:

Loss: 0.0999
Wer: 22.7854

Model description

This model is based on OpenAI's Whisper Small and fine-tuned for Armenian using a combination of real and synthetic audio data. It is designed to transcribe Armenian speech into text.

Intended uses & limitations

Intended Uses:

Armenian speech-to-text applications
Research on ASR for low-resource languages
Educational and experimental projects involving Whisper models

Limitations:

May not generalize well to accents or noisy audio not represented in the training set
he model may hallucinate text or produce inaccurate transcriptions, especially on unusual or out-of-distribution inputs, due to the inclusion of TTS-generated synthetic data in training.

Training and evaluation data

The dataset contains both real and high-quality synthetic Armenian speech clips.

Split(1)	# Clips	Duration (hours)
`train`	9,300	13.53
`test`	5,818	9.16
`eval`	5,856	8.76
`generated`	100,000	113.61
`generated[2]`	137,419	173.76

Total duration: ~318 hours
Train set duration(train+generated#1+generated#2: ~300 hours
Test set duration(test+eval) ~18 hours

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 3
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.0516	0.4999	7709	0.1417	33.0858
0.0366	0.9999	15418	0.1139	27.4340
0.0275	1.4998	23127	0.1057	25.0415
0.0308	1.9997	30836	0.0981	23.7545
0.017	2.4997	38545	0.1016	23.2408
0.019	2.9996	46254	0.0999	22.7854

Framework versions

Transformers 4.51.3
Pytorch 2.7.0+cu126
Datasets 3.6.0
Tokenizers 0.21.1