wav2vec2-base-librispeech-model

This model is a fine-tuned version of facebook/wav2vec2-base on the LIBRI10H - ENG dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5515
  • Wer: 0.4641

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 100.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
3.544 1.4493 500 1.7568 0.9820
1.42 2.8986 1000 1.0275 0.8168
1.0403 4.3478 1500 0.8305 0.7173
0.8574 5.7971 2000 0.7293 0.6649
0.7315 7.2464 2500 0.6632 0.6025
0.6389 8.6957 3000 0.6286 0.5695
0.5679 10.1449 3500 0.6102 0.5489
0.5085 11.5942 4000 0.5863 0.5215
0.4579 13.0435 4500 0.5661 0.4933
0.4097 14.4928 5000 0.5646 0.4823
0.382 15.9420 5500 0.5515 0.4644
0.3426 17.3913 6000 0.5585 0.4514
0.32 18.8406 6500 0.5598 0.4475
0.2926 20.2899 7000 0.6246 0.4587
0.2735 21.7391 7500 0.5887 0.4439
0.257 23.1884 8000 0.5978 0.4355
0.2409 24.6377 8500 0.5721 0.4215
0.2246 26.0870 9000 0.5979 0.4187
0.2139 27.5362 9500 0.6103 0.4145
0.2014 28.9855 10000 0.6436 0.4157
0.1917 30.4348 10500 0.6471 0.4188
0.184 31.8841 11000 0.6410 0.4068
0.1752 33.3333 11500 0.6426 0.4086
0.169 34.7826 12000 0.6633 0.4025
0.1612 36.2319 12500 0.6466 0.3968
0.1553 37.6812 13000 0.6573 0.3941
0.15 39.1304 13500 0.6989 0.3956
0.1442 40.5797 14000 0.7209 0.4062
0.1409 42.0290 14500 0.6950 0.3961
0.1356 43.4783 15000 0.6816 0.3863
0.134 44.9275 15500 0.6896 0.3867
0.1288 46.3768 16000 0.7073 0.3844
0.1263 47.8261 16500 0.7207 0.3836
0.1218 49.2754 17000 0.7430 0.3812
0.1217 50.7246 17500 0.7588 0.3831
0.1183 52.1739 18000 0.7478 0.3813
0.113 53.6232 18500 0.7269 0.3779
0.1109 55.0725 19000 0.7117 0.3735
0.1102 56.5217 19500 0.7532 0.3689
0.1084 57.9710 20000 0.7608 0.3704
0.1042 59.4203 20500 0.7571 0.3677
0.1048 60.8696 21000 0.7745 0.3683
0.1005 62.3188 21500 0.7845 0.3712
0.1006 63.7681 22000 0.7633 0.3664
0.0976 65.2174 22500 0.7721 0.3639
0.096 66.6667 23000 0.7659 0.3643
0.0938 68.1159 23500 0.7658 0.3620
0.0933 69.5652 24000 0.7692 0.3579
0.092 71.0145 24500 0.7785 0.3625
0.089 72.4638 25000 0.7845 0.3615
0.088 73.9130 25500 0.7973 0.3586
0.0862 75.3623 26000 0.7806 0.3576
0.0851 76.8116 26500 0.7947 0.3583
0.0846 78.2609 27000 0.7802 0.3526
0.0809 79.7101 27500 0.8093 0.3532
0.0813 81.1594 28000 0.8237 0.3572
0.0785 82.6087 28500 0.8130 0.3533
0.0799 84.0580 29000 0.7958 0.3511
0.0784 85.5072 29500 0.8108 0.3507
0.0767 86.9565 30000 0.8208 0.3511
0.0742 88.4058 30500 0.8270 0.3501
0.0746 89.8551 31000 0.8121 0.3459
0.073 91.3043 31500 0.8151 0.3485
0.0725 92.7536 32000 0.8265 0.3477
0.0717 94.2029 32500 0.8173 0.3446
0.0709 95.6522 33000 0.8135 0.3434
0.0704 97.1014 33500 0.8179 0.3431
0.0699 98.5507 34000 0.8134 0.3427
0.0691 100.0 34500 0.8155 0.3428

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.3.2
  • Tokenizers 0.21.0
Downloads last month
10
Safetensors
Model size
99.1M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for csikasote/wav2vec2-base-librispeech-model

Finetuned
(740)
this model