scarlett623's picture
End of training
c314f10
metadata
license: apache-2.0
base_model: facebook/wav2vec2-large-xlsr-53
tags:
  - generated_from_trainer
datasets:
  - common_voice
metrics:
  - wer
model-index:
  - name: wav2vec2-large-xlsr53-zh-cn-subset-colab
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice
          type: common_voice
          config: zh-CN
          split: test[:20%]
          args: zh-CN
        metrics:
          - name: Wer
            type: wer
            value: 0.9394977168949772

wav2vec2-large-xlsr53-zh-cn-subset-colab

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the common_voice dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3992
  • Wer: 0.9395
  • Cer: 0.3184

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 13
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 26
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Wer Cer
No log 1.9 400 33.6533 1.0 1.0
70.5767 3.81 800 6.8140 1.0 1.0
7.1379 5.71 1200 6.5163 1.0 1.0
6.4771 7.62 1600 6.4602 1.0 1.0
6.3627 9.52 2000 6.3406 1.0 0.9700
6.3627 11.43 2400 6.1021 1.0 0.9678
6.1201 13.33 2800 5.1523 1.0 0.8385
5.3531 15.24 3200 4.2224 1.0 0.7084
4.1733 17.14 3600 3.6981 1.0 0.6332
3.5472 19.05 4000 3.2708 0.9994 0.5827
3.5472 20.95 4400 2.9629 0.9989 0.5510
3.0668 22.86 4800 2.7122 0.9943 0.5165
2.7248 24.76 5200 2.5171 0.9914 0.4976
2.4609 26.67 5600 2.3538 0.9897 0.4759
2.2323 28.57 6000 2.2112 0.9874 0.4555
2.2323 30.48 6400 2.0850 0.9834 0.4370
2.0438 32.38 6800 1.9982 0.9806 0.4261
1.8837 34.29 7200 1.9179 0.9766 0.4137
1.7646 36.19 7600 1.8278 0.9766 0.4030
1.6469 38.1 8000 1.7627 0.9755 0.3937
1.6469 40.0 8400 1.7063 0.9709 0.3853
1.5422 41.9 8800 1.6649 0.9663 0.3787
1.4561 43.81 9200 1.6336 0.9697 0.3714
1.3842 45.71 9600 1.5943 0.9606 0.3647
1.3164 47.62 10000 1.5681 0.9669 0.3621
1.3164 49.52 10400 1.5535 0.9600 0.3582
1.2654 51.43 10800 1.5354 0.9538 0.3544
1.2186 53.33 11200 1.5003 0.9555 0.3482
1.1781 55.24 11600 1.4979 0.9572 0.3473
1.1344 57.14 12000 1.4820 0.9549 0.3453
1.1344 59.05 12400 1.4707 0.9509 0.3396
1.0965 60.95 12800 1.4657 0.9509 0.3384
1.0637 62.86 13200 1.4610 0.9509 0.3371
1.0306 64.76 13600 1.4461 0.9509 0.3361
1.0014 66.67 14000 1.4437 0.9503 0.3328
1.0014 68.57 14400 1.4334 0.9463 0.3304
0.9758 70.48 14800 1.4267 0.9429 0.3295
0.9486 72.38 15200 1.4250 0.9469 0.3269
0.933 74.29 15600 1.4214 0.9441 0.3273
0.9121 76.19 16000 1.4161 0.9441 0.3267
0.9121 78.1 16400 1.4137 0.9446 0.3268
0.9001 80.0 16800 1.4216 0.9446 0.3253
0.8789 81.9 17200 1.4164 0.9435 0.3264
0.8659 83.81 17600 1.3996 0.9424 0.3216
0.8471 85.71 18000 1.4079 0.9458 0.3226
0.8471 87.62 18400 1.4042 0.9412 0.3214
0.8387 89.52 18800 1.4073 0.9424 0.3214
0.8299 91.43 19200 1.4005 0.9418 0.3192
0.8257 93.33 19600 1.4040 0.9406 0.3200
0.813 95.24 20000 1.4012 0.9412 0.3184
0.813 97.14 20400 1.4011 0.9389 0.3183
0.8062 99.05 20800 1.3992 0.9395 0.3184

Framework versions

  • Transformers 4.32.0.dev0
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.4
  • Tokenizers 0.13.3