metadata

library_name: transformers
license: apache-2.0
base_model: openai/whisper-small
tags:
  - generated_from_trainer
datasets:
  - tachelhit_darija
metrics:
  - wer
model-index:
  - name: whisper-small-darija
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: tachelhit_darija
          type: tachelhit_darija
          config: default
          split: None
          args: default
        metrics:
          - type: wer
            value: 27.93522267206478
            name: Wer

whisper-small-darija

This model is a fine-tuned version of openai/whisper-small on the tachelhit_darija dataset. It achieves the following results on the evaluation set:

Loss: 0.3828
Wer: 27.9352

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
training_steps: 1000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.572	1.4286	100	0.6403	60.7287
0.2156	2.8571	200	0.4233	42.7800
0.0459	4.2857	300	0.3953	48.1781
0.0257	5.7143	400	0.3663	31.0391
0.0089	7.1429	500	0.3857	31.9838
0.0029	8.5714	600	0.3748	30.3644
0.0026	10.0	700	0.3756	29.4197
0.0012	11.4286	800	0.3801	27.5304
0.0011	12.8571	900	0.3821	27.9352
0.0013	14.2857	1000	0.3828	27.9352

Framework versions

Transformers 4.48.3
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0