metadata

license: apache-2.0
base_model: facebook/wav2vec2-large-xlsr-53
tags:
  - generated_from_trainer
datasets:
  - common_voice
metrics:
  - wer
model-index:
  - name: wav2vec2-large-xlsr53-zh-cn-subset-colab
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice
          type: common_voice
          config: zh-CN
          split: test[:20%]
          args: zh-CN
        metrics:
          - name: Wer
            type: wer
            value: 0.9394977168949772

wav2vec2-large-xlsr53-zh-cn-subset-colab

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the common_voice dataset. It achieves the following results on the evaluation set:

Loss: 1.3992
Wer: 0.9395
Cer: 0.3184

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 13
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 26
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
No log	1.9	400	33.6533	1.0	1.0
70.5767	3.81	800	6.8140	1.0	1.0
7.1379	5.71	1200	6.5163	1.0	1.0
6.4771	7.62	1600	6.4602	1.0	1.0
6.3627	9.52	2000	6.3406	1.0	0.9700
6.3627	11.43	2400	6.1021	1.0	0.9678
6.1201	13.33	2800	5.1523	1.0	0.8385
5.3531	15.24	3200	4.2224	1.0	0.7084
4.1733	17.14	3600	3.6981	1.0	0.6332
3.5472	19.05	4000	3.2708	0.9994	0.5827
3.5472	20.95	4400	2.9629	0.9989	0.5510
3.0668	22.86	4800	2.7122	0.9943	0.5165
2.7248	24.76	5200	2.5171	0.9914	0.4976
2.4609	26.67	5600	2.3538	0.9897	0.4759
2.2323	28.57	6000	2.2112	0.9874	0.4555
2.2323	30.48	6400	2.0850	0.9834	0.4370
2.0438	32.38	6800	1.9982	0.9806	0.4261
1.8837	34.29	7200	1.9179	0.9766	0.4137
1.7646	36.19	7600	1.8278	0.9766	0.4030
1.6469	38.1	8000	1.7627	0.9755	0.3937
1.6469	40.0	8400	1.7063	0.9709	0.3853
1.5422	41.9	8800	1.6649	0.9663	0.3787
1.4561	43.81	9200	1.6336	0.9697	0.3714
1.3842	45.71	9600	1.5943	0.9606	0.3647
1.3164	47.62	10000	1.5681	0.9669	0.3621
1.3164	49.52	10400	1.5535	0.9600	0.3582
1.2654	51.43	10800	1.5354	0.9538	0.3544
1.2186	53.33	11200	1.5003	0.9555	0.3482
1.1781	55.24	11600	1.4979	0.9572	0.3473
1.1344	57.14	12000	1.4820	0.9549	0.3453
1.1344	59.05	12400	1.4707	0.9509	0.3396
1.0965	60.95	12800	1.4657	0.9509	0.3384
1.0637	62.86	13200	1.4610	0.9509	0.3371
1.0306	64.76	13600	1.4461	0.9509	0.3361
1.0014	66.67	14000	1.4437	0.9503	0.3328
1.0014	68.57	14400	1.4334	0.9463	0.3304
0.9758	70.48	14800	1.4267	0.9429	0.3295
0.9486	72.38	15200	1.4250	0.9469	0.3269
0.933	74.29	15600	1.4214	0.9441	0.3273
0.9121	76.19	16000	1.4161	0.9441	0.3267
0.9121	78.1	16400	1.4137	0.9446	0.3268
0.9001	80.0	16800	1.4216	0.9446	0.3253
0.8789	81.9	17200	1.4164	0.9435	0.3264
0.8659	83.81	17600	1.3996	0.9424	0.3216
0.8471	85.71	18000	1.4079	0.9458	0.3226
0.8471	87.62	18400	1.4042	0.9412	0.3214
0.8387	89.52	18800	1.4073	0.9424	0.3214
0.8299	91.43	19200	1.4005	0.9418	0.3192
0.8257	93.33	19600	1.4040	0.9406	0.3200
0.813	95.24	20000	1.4012	0.9412	0.3184
0.813	97.14	20400	1.4011	0.9389	0.3183
0.8062	99.05	20800	1.3992	0.9395	0.3184

Framework versions

Transformers 4.32.0.dev0
Pytorch 2.0.1+cu118
Datasets 2.14.4
Tokenizers 0.13.3