Alvin-Nahabwe commited on
Commit
6e108a7
·
verified ·
1 Parent(s): 1386baa

End of training

Browse files
README.md ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ language:
4
+ - lg
5
+ base_model: asr-africa/wav2vec2-asr-africa-base
6
+ tags:
7
+ - asr
8
+ - luganda
9
+ - wav2vec2-base
10
+ - speech
11
+ - asr-africa
12
+ - robust-fine-tuning
13
+ - lg
14
+ - generated_from_trainer
15
+ datasets:
16
+ - mozilla-foundation/common_voice_17_0
17
+ - google/fleurs
18
+ metrics:
19
+ - wer
20
+ model-index:
21
+ - name: Wav2Vec2-Base - Luganda - asr-africa
22
+ results:
23
+ - task:
24
+ name: Automatic Speech Recognition
25
+ type: automatic-speech-recognition
26
+ dataset:
27
+ name: common_voice_17_0
28
+ type: mozilla-foundation/common_voice_17_0
29
+ metrics:
30
+ - name: Wer
31
+ type: wer
32
+ value: 0.1883851956379634
33
+ - task:
34
+ name: Automatic Speech Recognition
35
+ type: automatic-speech-recognition
36
+ dataset:
37
+ name: fleurs
38
+ type: google/fleurs
39
+ metrics:
40
+ - name: Wer
41
+ type: wer
42
+ value: 0.1883851956379634
43
+ ---
44
+
45
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
46
+ should probably proofread and complete it, then remove this comment. -->
47
+
48
+ # Wav2Vec2-Base - Luganda - asr-africa
49
+
50
+ This model is a fine-tuned version of [asr-africa/wav2vec2-asr-africa-base](https://huggingface.co/asr-africa/wav2vec2-asr-africa-base) on the common_voice_17_0 and the fleurs datasets.
51
+ It achieves the following results on the evaluation set:
52
+ - Loss: 0.1244
53
+ - Wer: 0.1884
54
+ - Cer: 0.0350
55
+
56
+ ## Model description
57
+
58
+ More information needed
59
+
60
+ ## Intended uses & limitations
61
+
62
+ More information needed
63
+
64
+ ## Training and evaluation data
65
+
66
+ More information needed
67
+
68
+ ## Training procedure
69
+
70
+ ### Training hyperparameters
71
+
72
+ The following hyperparameters were used during training:
73
+ - learning_rate: 7e-05
74
+ - train_batch_size: 64
75
+ - eval_batch_size: 32
76
+ - seed: 42
77
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
78
+ - lr_scheduler_type: linear
79
+ - lr_scheduler_warmup_ratio: 0.01
80
+ - num_epochs: 100.0
81
+ - mixed_precision_training: Native AMP
82
+
83
+ ### Training results
84
+
85
+ | Training Loss | Epoch | Step | Cer | Validation Loss | Wer |
86
+ |:-------------:|:-----:|:------:|:------:|:---------------:|:------:|
87
+ | 3.2575 | 1.0 | 3012 | 0.1093 | 0.3726 | 0.5341 |
88
+ | 0.8396 | 2.0 | 6024 | 0.0781 | 0.2538 | 0.3984 |
89
+ | 0.7487 | 3.0 | 9036 | 0.0682 | 0.2277 | 0.3531 |
90
+ | 0.7226 | 4.0 | 12048 | 0.0643 | 0.2133 | 0.3364 |
91
+ | 0.7096 | 5.0 | 15060 | 0.0612 | 0.2084 | 0.3211 |
92
+ | 0.6979 | 6.0 | 18072 | 0.0640 | 0.2091 | 0.3287 |
93
+ | 0.6899 | 7.0 | 21084 | 0.0608 | 0.2019 | 0.3162 |
94
+ | 0.6765 | 8.0 | 24096 | 0.0601 | 0.1973 | 0.3106 |
95
+ | 0.6701 | 9.0 | 27108 | 0.0582 | 0.1928 | 0.3047 |
96
+ | 0.6621 | 10.0 | 30120 | 0.0582 | 0.1924 | 0.3039 |
97
+ | 0.6554 | 11.0 | 33132 | 0.0566 | 0.1867 | 0.2983 |
98
+ | 0.6475 | 12.0 | 36144 | 0.0552 | 0.1829 | 0.2874 |
99
+ | 0.6429 | 13.0 | 39156 | 0.0542 | 0.1802 | 0.2853 |
100
+ | 0.6351 | 14.0 | 42168 | 0.0553 | 0.1826 | 0.2873 |
101
+ | 0.6319 | 15.0 | 45180 | 0.0544 | 0.1793 | 0.2832 |
102
+ | 0.6251 | 16.0 | 48192 | 0.0548 | 0.1785 | 0.2838 |
103
+ | 0.6172 | 17.0 | 51204 | 0.0517 | 0.1709 | 0.2719 |
104
+ | 0.6122 | 18.0 | 54216 | 0.0521 | 0.1720 | 0.2716 |
105
+ | 0.6068 | 19.0 | 57228 | 0.0505 | 0.1694 | 0.2665 |
106
+ | 0.6035 | 20.0 | 60240 | 0.0497 | 0.1670 | 0.2628 |
107
+ | 0.5957 | 21.0 | 63252 | 0.0504 | 0.1704 | 0.2644 |
108
+ | 0.5909 | 22.0 | 66264 | 0.0493 | 0.1653 | 0.2599 |
109
+ | 0.5879 | 23.0 | 69276 | 0.0487 | 0.1675 | 0.2573 |
110
+ | 0.5966 | 24.0 | 72288 | 0.0510 | 0.1943 | 0.2739 |
111
+ | 0.6444 | 25.0 | 75300 | 0.0515 | 0.1868 | 0.2723 |
112
+ | 0.5999 | 26.0 | 78312 | 0.0491 | 0.1677 | 0.2578 |
113
+ | 0.5911 | 27.0 | 81324 | 0.0474 | 0.1679 | 0.2510 |
114
+ | 0.586 | 28.0 | 84336 | 0.0484 | 0.1723 | 0.2539 |
115
+ | 0.5816 | 29.0 | 87348 | 0.0477 | 0.1678 | 0.2526 |
116
+ | 0.5886 | 30.0 | 90360 | 0.0499 | 0.1824 | 0.2629 |
117
+ | 0.5978 | 31.0 | 93372 | 0.0470 | 0.1620 | 0.2491 |
118
+ | 0.5722 | 32.0 | 96384 | 0.0465 | 0.1584 | 0.2472 |
119
+ | 0.5615 | 33.0 | 99396 | 0.0461 | 0.1564 | 0.2421 |
120
+ | 0.5566 | 34.0 | 102408 | 0.0448 | 0.1530 | 0.2368 |
121
+ | 0.5514 | 35.0 | 105420 | 0.0432 | 0.1499 | 0.2309 |
122
+ | 0.5485 | 36.0 | 108432 | 0.0436 | 0.1511 | 0.2308 |
123
+ | 0.5451 | 37.0 | 111444 | 0.0439 | 0.1507 | 0.2319 |
124
+ | 0.5433 | 38.0 | 114456 | 0.0434 | 0.1482 | 0.2312 |
125
+ | 0.5391 | 39.0 | 117468 | 0.0435 | 0.1468 | 0.2291 |
126
+ | 0.5347 | 40.0 | 120480 | 0.0430 | 0.1463 | 0.2274 |
127
+ | 0.5313 | 41.0 | 123492 | 0.0422 | 0.1450 | 0.2240 |
128
+ | 0.5291 | 42.0 | 126504 | 0.0419 | 0.1446 | 0.2241 |
129
+ | 0.5269 | 43.0 | 129516 | 0.0427 | 0.1453 | 0.2255 |
130
+ | 0.5253 | 44.0 | 132528 | 0.0425 | 0.1446 | 0.2253 |
131
+ | 0.523 | 45.0 | 135540 | 0.0412 | 0.1430 | 0.2202 |
132
+ | 0.5192 | 46.0 | 138552 | 0.0409 | 0.1414 | 0.2172 |
133
+ | 0.518 | 47.0 | 141564 | 0.0405 | 0.1404 | 0.2160 |
134
+ | 0.5139 | 48.0 | 144576 | 0.0401 | 0.1400 | 0.2143 |
135
+ | 0.5133 | 49.0 | 147588 | 0.0412 | 0.1414 | 0.2180 |
136
+ | 0.5114 | 50.0 | 150600 | 0.0404 | 0.1402 | 0.2149 |
137
+ | 0.5087 | 51.0 | 153612 | 0.0406 | 0.1404 | 0.2165 |
138
+ | 0.5066 | 52.0 | 156624 | 0.0404 | 0.1389 | 0.2157 |
139
+ | 0.5037 | 53.0 | 159636 | 0.0398 | 0.1375 | 0.2132 |
140
+ | 0.5024 | 54.0 | 162648 | 0.0398 | 0.1372 | 0.2121 |
141
+ | 0.5 | 55.0 | 165660 | 0.0401 | 0.1379 | 0.2132 |
142
+ | 0.4976 | 56.0 | 168672 | 0.0386 | 0.1349 | 0.2072 |
143
+ | 0.4948 | 57.0 | 171684 | 0.0393 | 0.1362 | 0.2102 |
144
+ | 0.4933 | 58.0 | 174696 | 0.0389 | 0.1355 | 0.2068 |
145
+ | 0.4924 | 59.0 | 177708 | 0.0385 | 0.1361 | 0.2055 |
146
+ | 0.4901 | 60.0 | 180720 | 0.0384 | 0.1346 | 0.2054 |
147
+ | 0.4898 | 61.0 | 183732 | 0.0384 | 0.1334 | 0.2050 |
148
+ | 0.4873 | 62.0 | 186744 | 0.0384 | 0.1342 | 0.2060 |
149
+ | 0.4865 | 63.0 | 189756 | 0.0387 | 0.1346 | 0.2070 |
150
+ | 0.4842 | 64.0 | 192768 | 0.0387 | 0.1346 | 0.2072 |
151
+ | 0.4822 | 65.0 | 195780 | 0.0381 | 0.1325 | 0.2040 |
152
+ | 0.4814 | 66.0 | 198792 | 0.0371 | 0.1312 | 0.1989 |
153
+ | 0.4796 | 67.0 | 201804 | 0.0374 | 0.1312 | 0.2000 |
154
+ | 0.4771 | 68.0 | 204816 | 0.0372 | 0.1304 | 0.1997 |
155
+ | 0.4756 | 69.0 | 207828 | 0.0377 | 0.1308 | 0.2009 |
156
+ | 0.4745 | 70.0 | 210840 | 0.0370 | 0.1312 | 0.1982 |
157
+ | 0.4738 | 71.0 | 213852 | 0.0374 | 0.1307 | 0.2001 |
158
+ | 0.473 | 72.0 | 216864 | 0.0372 | 0.1307 | 0.1991 |
159
+ | 0.472 | 73.0 | 219876 | 0.0366 | 0.1292 | 0.1961 |
160
+ | 0.4693 | 74.0 | 222888 | 0.0364 | 0.1287 | 0.1952 |
161
+ | 0.4693 | 75.0 | 225900 | 0.0363 | 0.1284 | 0.1945 |
162
+ | 0.4664 | 76.0 | 228912 | 0.0368 | 0.1288 | 0.1969 |
163
+ | 0.4651 | 77.0 | 231924 | 0.0368 | 0.1287 | 0.1971 |
164
+ | 0.4641 | 78.0 | 234936 | 0.0366 | 0.1287 | 0.1952 |
165
+ | 0.462 | 79.0 | 237948 | 0.0364 | 0.1287 | 0.1945 |
166
+ | 0.4608 | 80.0 | 240960 | 0.0363 | 0.1275 | 0.1952 |
167
+ | 0.4594 | 81.0 | 243972 | 0.0361 | 0.1277 | 0.1939 |
168
+ | 0.4595 | 82.0 | 246984 | 0.0359 | 0.1268 | 0.1937 |
169
+ | 0.4575 | 83.0 | 249996 | 0.0362 | 0.1272 | 0.1942 |
170
+ | 0.4569 | 84.0 | 253008 | 0.1268 | 0.1934 | 0.0361 |
171
+ | 0.4552 | 85.0 | 256020 | 0.1262 | 0.1916 | 0.0357 |
172
+ | 0.4538 | 86.0 | 259032 | 0.1259 | 0.1907 | 0.0355 |
173
+ | 0.4532 | 87.0 | 262044 | 0.1258 | 0.1912 | 0.0355 |
174
+ | 0.4524 | 88.0 | 265056 | 0.1260 | 0.1910 | 0.0356 |
175
+ | 0.4501 | 89.0 | 268068 | 0.1266 | 0.1928 | 0.0360 |
176
+ | 0.4491 | 90.0 | 271080 | 0.1252 | 0.1904 | 0.0355 |
177
+ | 0.4486 | 91.0 | 274092 | 0.1253 | 0.1889 | 0.0352 |
178
+ | 0.4487 | 92.0 | 277104 | 0.1253 | 0.1902 | 0.0354 |
179
+ | 0.4471 | 93.0 | 280116 | 0.1252 | 0.1894 | 0.0352 |
180
+ | 0.4458 | 94.0 | 283128 | 0.1253 | 0.1891 | 0.0352 |
181
+ | 0.4449 | 95.0 | 286140 | 0.1248 | 0.1884 | 0.0351 |
182
+ | 0.4434 | 96.0 | 289152 | 0.1247 | 0.1891 | 0.0351 |
183
+ | 0.4435 | 97.0 | 292164 | 0.1247 | 0.1891 | 0.0352 |
184
+ | 0.4444 | 98.0 | 295176 | 0.1245 | 0.1888 | 0.0351 |
185
+ | 0.4429 | 99.0 | 298188 | 0.1244 | 0.1887 | 0.0351 |
186
+ | 0.4426 | 100.0 | 301200 | 0.1244 | 0.1884 | 0.0350 |
187
+
188
+
189
+ ### Framework versions
190
+
191
+ - Transformers 4.51.3
192
+ - Pytorch 2.7.0+cu128
193
+ - Datasets 3.6.0
194
+ - Tokenizers 0.21.1
added_tokens.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "</s>": 30,
3
+ "<s>": 29
4
+ }
checkpoint-298188/config.json ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_dropout": 0.0,
3
+ "adapter_attn_dim": null,
4
+ "adapter_kernel_size": 3,
5
+ "adapter_stride": 2,
6
+ "add_adapter": false,
7
+ "apply_spec_augment": true,
8
+ "architectures": [
9
+ "Wav2Vec2ForCTC"
10
+ ],
11
+ "attention_dropout": 0.0,
12
+ "bos_token_id": 1,
13
+ "classifier_proj_size": 256,
14
+ "codevector_dim": 256,
15
+ "contrastive_logits_temperature": 0.1,
16
+ "conv_bias": true,
17
+ "conv_dim": [
18
+ 512,
19
+ 512,
20
+ 512,
21
+ 512,
22
+ 512,
23
+ 512,
24
+ 512
25
+ ],
26
+ "conv_kernel": [
27
+ 10,
28
+ 3,
29
+ 3,
30
+ 3,
31
+ 3,
32
+ 2,
33
+ 2
34
+ ],
35
+ "conv_stride": [
36
+ 5,
37
+ 2,
38
+ 2,
39
+ 2,
40
+ 2,
41
+ 2,
42
+ 2
43
+ ],
44
+ "ctc_loss_reduction": "mean",
45
+ "ctc_zero_infinity": false,
46
+ "diversity_loss_weight": 0.1,
47
+ "do_stable_layer_norm": true,
48
+ "eos_token_id": 2,
49
+ "feat_extract_activation": "gelu",
50
+ "feat_extract_dropout": 0.0,
51
+ "feat_extract_norm": "layer",
52
+ "feat_proj_dropout": 0.0,
53
+ "feat_quantizer_dropout": 0.0,
54
+ "final_dropout": 0.0,
55
+ "hidden_act": "gelu",
56
+ "hidden_dropout": 0.0,
57
+ "hidden_dropout_prob": 0.0,
58
+ "hidden_size": 768,
59
+ "initializer_range": 0.02,
60
+ "intermediate_size": 3072,
61
+ "layer_norm_eps": 1e-05,
62
+ "layerdrop": 0.0,
63
+ "mask_feature_length": 10,
64
+ "mask_feature_min_masks": 0,
65
+ "mask_feature_prob": 0.0,
66
+ "mask_time_length": 10,
67
+ "mask_time_min_masks": 2,
68
+ "mask_time_prob": 0.65,
69
+ "model_type": "wav2vec2",
70
+ "num_adapter_layers": 3,
71
+ "num_attention_heads": 12,
72
+ "num_codevector_groups": 2,
73
+ "num_codevectors_per_group": 320,
74
+ "num_conv_pos_embedding_groups": 16,
75
+ "num_conv_pos_embeddings": 128,
76
+ "num_feat_extract_layers": 7,
77
+ "num_hidden_layers": 12,
78
+ "num_negatives": 100,
79
+ "output_hidden_size": 768,
80
+ "pad_token_id": 28,
81
+ "proj_codevector_dim": 256,
82
+ "tdnn_dilation": [
83
+ 1,
84
+ 2,
85
+ 3,
86
+ 1,
87
+ 1
88
+ ],
89
+ "tdnn_dim": [
90
+ 512,
91
+ 512,
92
+ 512,
93
+ 512,
94
+ 1500
95
+ ],
96
+ "tdnn_kernel": [
97
+ 5,
98
+ 3,
99
+ 3,
100
+ 1,
101
+ 1
102
+ ],
103
+ "torch_dtype": "float32",
104
+ "transformers_version": "4.51.3",
105
+ "use_weighted_layer_sum": false,
106
+ "vocab_size": 32,
107
+ "xvector_output_dim": 512
108
+ }
checkpoint-298188/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e74a62a2111486790e5a1b4735c67fc9b2cb04df54d348b3150eae08cff72c44
3
+ size 377652400
checkpoint-298188/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:290ea98235c2325aa74035dfb688a5dcce98f9cf5df37d63b86734add638f8d7
3
+ size 755442827
checkpoint-298188/preprocessor_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_normalize": true,
3
+ "feature_extractor_type": "Wav2Vec2FeatureExtractor",
4
+ "feature_size": 1,
5
+ "padding_side": "right",
6
+ "padding_value": 0.0,
7
+ "return_attention_mask": false,
8
+ "sampling_rate": 16000
9
+ }
checkpoint-298188/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b7413ad18b1596df9a07e4bb18475008eefd77f8c7f6e15f87a1e7860cb346aa
3
+ size 14709
checkpoint-298188/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:147c7b74438102463ebbe91075d3c2f024a3a20baa7c27e696cc5a047d481e26
3
+ size 1383
checkpoint-298188/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b3b0051297f53b8083c690174ee73be0a9e627ef19dfd082227f7b491e9a8c1
3
+ size 1465
checkpoint-298188/trainer_state.json ADDED
@@ -0,0 +1,1726 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 286140,
3
+ "best_metric": 0.1884266993162269,
4
+ "best_model_checkpoint": "wav2vec2-asr-africa-base-fintuned-luganda-400hrs-v0.1/checkpoint-286140",
5
+ "epoch": 99.0,
6
+ "eval_steps": 500,
7
+ "global_step": 298188,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 1.0,
14
+ "grad_norm": 7.066422462463379,
15
+ "learning_rate": 6.98140770252324e-05,
16
+ "loss": 3.2575,
17
+ "step": 3012
18
+ },
19
+ {
20
+ "epoch": 1.0,
21
+ "eval_cer": 0.10930492502565166,
22
+ "eval_loss": 0.37255266308784485,
23
+ "eval_runtime": 151.5408,
24
+ "eval_samples_per_second": 239.869,
25
+ "eval_steps_per_second": 7.496,
26
+ "eval_wer": 0.534090083733671,
27
+ "step": 3012
28
+ },
29
+ {
30
+ "epoch": 2.0,
31
+ "grad_norm": 15.221166610717773,
32
+ "learning_rate": 6.929480730277542e-05,
33
+ "loss": 0.8396,
34
+ "step": 6024
35
+ },
36
+ {
37
+ "epoch": 2.0,
38
+ "eval_cer": 0.07810481279107405,
39
+ "eval_loss": 0.2538328468799591,
40
+ "eval_runtime": 152.0166,
41
+ "eval_samples_per_second": 239.119,
42
+ "eval_steps_per_second": 7.473,
43
+ "eval_wer": 0.3983557626127949,
44
+ "step": 6024
45
+ },
46
+ {
47
+ "epoch": 3.0,
48
+ "grad_norm": 7.060754299163818,
49
+ "learning_rate": 6.858773659570472e-05,
50
+ "loss": 0.7487,
51
+ "step": 9036
52
+ },
53
+ {
54
+ "epoch": 3.0,
55
+ "eval_cer": 0.0682448584368034,
56
+ "eval_loss": 0.22769133746623993,
57
+ "eval_runtime": 153.0969,
58
+ "eval_samples_per_second": 237.431,
59
+ "eval_steps_per_second": 7.42,
60
+ "eval_wer": 0.35308562554689743,
61
+ "step": 9036
62
+ },
63
+ {
64
+ "epoch": 4.0,
65
+ "grad_norm": 9.472733497619629,
66
+ "learning_rate": 6.788090063986478e-05,
67
+ "loss": 0.7226,
68
+ "step": 12048
69
+ },
70
+ {
71
+ "epoch": 4.0,
72
+ "eval_cer": 0.06425062788093966,
73
+ "eval_loss": 0.21326717734336853,
74
+ "eval_runtime": 171.3308,
75
+ "eval_samples_per_second": 212.163,
76
+ "eval_steps_per_second": 6.63,
77
+ "eval_wer": 0.33642189872410777,
78
+ "step": 12048
79
+ },
80
+ {
81
+ "epoch": 5.0,
82
+ "grad_norm": 23.48148536682129,
83
+ "learning_rate": 6.717429943525561e-05,
84
+ "loss": 0.7096,
85
+ "step": 15060
86
+ },
87
+ {
88
+ "epoch": 5.0,
89
+ "eval_cer": 0.06122744648913919,
90
+ "eval_loss": 0.20842401683330536,
91
+ "eval_runtime": 174.4012,
92
+ "eval_samples_per_second": 208.427,
93
+ "eval_steps_per_second": 6.514,
94
+ "eval_wer": 0.3211001241651708,
95
+ "step": 15060
96
+ },
97
+ {
98
+ "epoch": 6.0,
99
+ "grad_norm": 7.9525041580200195,
100
+ "learning_rate": 6.646746347941567e-05,
101
+ "loss": 0.6979,
102
+ "step": 18072
103
+ },
104
+ {
105
+ "epoch": 6.0,
106
+ "eval_cer": 0.06395399482657282,
107
+ "eval_loss": 0.20914477109909058,
108
+ "eval_runtime": 153.4786,
109
+ "eval_samples_per_second": 236.841,
110
+ "eval_steps_per_second": 7.402,
111
+ "eval_wer": 0.3287402596055075,
112
+ "step": 18072
113
+ },
114
+ {
115
+ "epoch": 7.0,
116
+ "grad_norm": 43.115169525146484,
117
+ "learning_rate": 6.57608622748065e-05,
118
+ "loss": 0.6899,
119
+ "step": 21084
120
+ },
121
+ {
122
+ "epoch": 7.0,
123
+ "eval_cer": 0.06076026089037598,
124
+ "eval_loss": 0.20185638964176178,
125
+ "eval_runtime": 173.3256,
126
+ "eval_samples_per_second": 209.721,
127
+ "eval_steps_per_second": 6.554,
128
+ "eval_wer": 0.31623727652863237,
129
+ "step": 21084
130
+ },
131
+ {
132
+ "epoch": 8.0,
133
+ "grad_norm": 4.6848602294921875,
134
+ "learning_rate": 6.505426107019733e-05,
135
+ "loss": 0.6765,
136
+ "step": 24096
137
+ },
138
+ {
139
+ "epoch": 8.0,
140
+ "eval_cer": 0.060137193933831115,
141
+ "eval_loss": 0.19728189706802368,
142
+ "eval_runtime": 154.1162,
143
+ "eval_samples_per_second": 235.861,
144
+ "eval_steps_per_second": 7.371,
145
+ "eval_wer": 0.31059969356450884,
146
+ "step": 24096
147
+ },
148
+ {
149
+ "epoch": 9.0,
150
+ "grad_norm": 5.830124378204346,
151
+ "learning_rate": 6.434719036312661e-05,
152
+ "loss": 0.6701,
153
+ "step": 27108
154
+ },
155
+ {
156
+ "epoch": 9.0,
157
+ "eval_cer": 0.05820976679231927,
158
+ "eval_loss": 0.19281432032585144,
159
+ "eval_runtime": 156.5256,
160
+ "eval_samples_per_second": 232.23,
161
+ "eval_steps_per_second": 7.258,
162
+ "eval_wer": 0.30471654717065966,
163
+ "step": 27108
164
+ },
165
+ {
166
+ "epoch": 10.0,
167
+ "grad_norm": Infinity,
168
+ "learning_rate": 6.364058915851744e-05,
169
+ "loss": 0.6621,
170
+ "step": 30120
171
+ },
172
+ {
173
+ "epoch": 10.0,
174
+ "eval_cer": 0.058245986284274416,
175
+ "eval_loss": 0.19237777590751648,
176
+ "eval_runtime": 155.0055,
177
+ "eval_samples_per_second": 234.508,
178
+ "eval_steps_per_second": 7.329,
179
+ "eval_wer": 0.3038691804061135,
180
+ "step": 30120
181
+ },
182
+ {
183
+ "epoch": 11.0,
184
+ "grad_norm": 5.588762283325195,
185
+ "learning_rate": 6.29337532026775e-05,
186
+ "loss": 0.6554,
187
+ "step": 33132
188
+ },
189
+ {
190
+ "epoch": 11.0,
191
+ "eval_cer": 0.05662298626501848,
192
+ "eval_loss": 0.18665704131126404,
193
+ "eval_runtime": 153.969,
194
+ "eval_samples_per_second": 236.087,
195
+ "eval_steps_per_second": 7.378,
196
+ "eval_wer": 0.2982627252006876,
197
+ "step": 33132
198
+ },
199
+ {
200
+ "epoch": 12.0,
201
+ "grad_norm": 5.955714702606201,
202
+ "learning_rate": 6.222691724683756e-05,
203
+ "loss": 0.6475,
204
+ "step": 36144
205
+ },
206
+ {
207
+ "epoch": 12.0,
208
+ "eval_cer": 0.05515495065438077,
209
+ "eval_loss": 0.1829417496919632,
210
+ "eval_runtime": 155.1287,
211
+ "eval_samples_per_second": 234.322,
212
+ "eval_steps_per_second": 7.323,
213
+ "eval_wer": 0.2873610923768119,
214
+ "step": 36144
215
+ },
216
+ {
217
+ "epoch": 13.0,
218
+ "grad_norm": 31.985713958740234,
219
+ "learning_rate": 6.152008129099762e-05,
220
+ "loss": 0.6429,
221
+ "step": 39156
222
+ },
223
+ {
224
+ "epoch": 13.0,
225
+ "eval_cer": 0.05419032013446143,
226
+ "eval_loss": 0.1801947057247162,
227
+ "eval_runtime": 168.8898,
228
+ "eval_samples_per_second": 215.229,
229
+ "eval_steps_per_second": 6.726,
230
+ "eval_wer": 0.28527553254407173,
231
+ "step": 39156
232
+ },
233
+ {
234
+ "epoch": 14.0,
235
+ "grad_norm": 6.557770252227783,
236
+ "learning_rate": 6.081324533515768e-05,
237
+ "loss": 0.6351,
238
+ "step": 42168
239
+ },
240
+ {
241
+ "epoch": 14.0,
242
+ "eval_cer": 0.055327337097104,
243
+ "eval_loss": 0.18261073529720306,
244
+ "eval_runtime": 159.2183,
245
+ "eval_samples_per_second": 228.303,
246
+ "eval_steps_per_second": 7.135,
247
+ "eval_wer": 0.2872746263804296,
248
+ "step": 42168
249
+ },
250
+ {
251
+ "epoch": 15.0,
252
+ "grad_norm": 8.820505142211914,
253
+ "learning_rate": 6.010640937931774e-05,
254
+ "loss": 0.6319,
255
+ "step": 45180
256
+ },
257
+ {
258
+ "epoch": 15.0,
259
+ "eval_cer": 0.05439250742499585,
260
+ "eval_loss": 0.17926117777824402,
261
+ "eval_runtime": 152.0308,
262
+ "eval_samples_per_second": 239.096,
263
+ "eval_steps_per_second": 7.472,
264
+ "eval_wer": 0.28315884495263394,
265
+ "step": 45180
266
+ },
267
+ {
268
+ "epoch": 16.0,
269
+ "grad_norm": 8.058792114257812,
270
+ "learning_rate": 5.93995734234778e-05,
271
+ "loss": 0.6251,
272
+ "step": 48192
273
+ },
274
+ {
275
+ "epoch": 16.0,
276
+ "eval_cer": 0.054798715904391546,
277
+ "eval_loss": 0.1785019189119339,
278
+ "eval_runtime": 154.575,
279
+ "eval_samples_per_second": 235.161,
280
+ "eval_steps_per_second": 7.349,
281
+ "eval_wer": 0.283826362444705,
282
+ "step": 48192
283
+ },
284
+ {
285
+ "epoch": 17.0,
286
+ "grad_norm": 7.038857936859131,
287
+ "learning_rate": 5.86925027164071e-05,
288
+ "loss": 0.6172,
289
+ "step": 51204
290
+ },
291
+ {
292
+ "epoch": 17.0,
293
+ "eval_cer": 0.051710431121988164,
294
+ "eval_loss": 0.17091116309165955,
295
+ "eval_runtime": 154.6152,
296
+ "eval_samples_per_second": 235.1,
297
+ "eval_steps_per_second": 7.347,
298
+ "eval_wer": 0.27192518270265037,
299
+ "step": 51204
300
+ },
301
+ {
302
+ "epoch": 18.0,
303
+ "grad_norm": NaN,
304
+ "learning_rate": 5.7985901511797926e-05,
305
+ "loss": 0.6122,
306
+ "step": 54216
307
+ },
308
+ {
309
+ "epoch": 18.0,
310
+ "eval_cer": 0.05208454638066411,
311
+ "eval_loss": 0.1720370054244995,
312
+ "eval_runtime": 154.3969,
313
+ "eval_samples_per_second": 235.432,
314
+ "eval_steps_per_second": 7.358,
315
+ "eval_wer": 0.27160698783596365,
316
+ "step": 54216
317
+ },
318
+ {
319
+ "epoch": 19.0,
320
+ "grad_norm": 4.123114109039307,
321
+ "learning_rate": 5.727930030718875e-05,
322
+ "loss": 0.6068,
323
+ "step": 57228
324
+ },
325
+ {
326
+ "epoch": 19.0,
327
+ "eval_cer": 0.0505266497520111,
328
+ "eval_loss": 0.16939722001552582,
329
+ "eval_runtime": 154.2835,
330
+ "eval_samples_per_second": 235.605,
331
+ "eval_steps_per_second": 7.363,
332
+ "eval_wer": 0.26646744901100194,
333
+ "step": 57228
334
+ },
335
+ {
336
+ "epoch": 20.0,
337
+ "grad_norm": 11.014168739318848,
338
+ "learning_rate": 5.657222960011804e-05,
339
+ "loss": 0.6035,
340
+ "step": 60240
341
+ },
342
+ {
343
+ "epoch": 20.0,
344
+ "eval_cer": 0.049698644657441546,
345
+ "eval_loss": 0.1669510304927826,
346
+ "eval_runtime": 155.9471,
347
+ "eval_samples_per_second": 233.092,
348
+ "eval_steps_per_second": 7.285,
349
+ "eval_wer": 0.26278053892526226,
350
+ "step": 60240
351
+ },
352
+ {
353
+ "epoch": 21.0,
354
+ "grad_norm": 7.567544937133789,
355
+ "learning_rate": 5.5865158893047335e-05,
356
+ "loss": 0.5957,
357
+ "step": 63252
358
+ },
359
+ {
360
+ "epoch": 21.0,
361
+ "eval_cer": 0.050415698903237105,
362
+ "eval_loss": 0.1704263538122177,
363
+ "eval_runtime": 155.0829,
364
+ "eval_samples_per_second": 234.391,
365
+ "eval_steps_per_second": 7.325,
366
+ "eval_wer": 0.2643818891782618,
367
+ "step": 63252
368
+ },
369
+ {
370
+ "epoch": 22.0,
371
+ "grad_norm": 3.353114366531372,
372
+ "learning_rate": 5.5158557688438164e-05,
373
+ "loss": 0.5909,
374
+ "step": 66264
375
+ },
376
+ {
377
+ "epoch": 22.0,
378
+ "eval_cer": 0.049318569229203364,
379
+ "eval_loss": 0.16528591513633728,
380
+ "eval_runtime": 155.149,
381
+ "eval_samples_per_second": 234.291,
382
+ "eval_steps_per_second": 7.322,
383
+ "eval_wer": 0.25990640920551583,
384
+ "step": 66264
385
+ },
386
+ {
387
+ "epoch": 23.0,
388
+ "grad_norm": 5.273142337799072,
389
+ "learning_rate": 5.445172173259822e-05,
390
+ "loss": 0.5879,
391
+ "step": 69276
392
+ },
393
+ {
394
+ "epoch": 23.0,
395
+ "eval_cer": 0.048735389561267335,
396
+ "eval_loss": 0.16745983064174652,
397
+ "eval_runtime": 155.8132,
398
+ "eval_samples_per_second": 233.292,
399
+ "eval_steps_per_second": 7.291,
400
+ "eval_wer": 0.2573400984328903,
401
+ "step": 69276
402
+ },
403
+ {
404
+ "epoch": 24.0,
405
+ "grad_norm": 12.098519325256348,
406
+ "learning_rate": 5.374512052798905e-05,
407
+ "loss": 0.5966,
408
+ "step": 72288
409
+ },
410
+ {
411
+ "epoch": 24.0,
412
+ "eval_cer": 0.05103463958854657,
413
+ "eval_loss": 0.19431033730506897,
414
+ "eval_runtime": 154.0761,
415
+ "eval_samples_per_second": 235.922,
416
+ "eval_steps_per_second": 7.373,
417
+ "eval_wer": 0.2738551037419025,
418
+ "step": 72288
419
+ },
420
+ {
421
+ "epoch": 25.0,
422
+ "grad_norm": 12.023294448852539,
423
+ "learning_rate": 5.3038519323379875e-05,
424
+ "loss": 0.6444,
425
+ "step": 75300
426
+ },
427
+ {
428
+ "epoch": 25.0,
429
+ "eval_cer": 0.05154996501838942,
430
+ "eval_loss": 0.1868334412574768,
431
+ "eval_runtime": 152.5369,
432
+ "eval_samples_per_second": 238.303,
433
+ "eval_steps_per_second": 7.447,
434
+ "eval_wer": 0.27229871580702175,
435
+ "step": 75300
436
+ },
437
+ {
438
+ "epoch": 26.0,
439
+ "grad_norm": 9.066435813903809,
440
+ "learning_rate": 5.2331448616309165e-05,
441
+ "loss": 0.5999,
442
+ "step": 78312
443
+ },
444
+ {
445
+ "epoch": 26.0,
446
+ "eval_cer": 0.04910904634536157,
447
+ "eval_loss": 0.16771361231803894,
448
+ "eval_runtime": 154.4851,
449
+ "eval_samples_per_second": 235.298,
450
+ "eval_steps_per_second": 7.353,
451
+ "eval_wer": 0.25782430801263095,
452
+ "step": 78312
453
+ },
454
+ {
455
+ "epoch": 27.0,
456
+ "grad_norm": 9.274683952331543,
457
+ "learning_rate": 5.1624847411699994e-05,
458
+ "loss": 0.5911,
459
+ "step": 81324
460
+ },
461
+ {
462
+ "epoch": 27.0,
463
+ "eval_cer": 0.04738747429103783,
464
+ "eval_loss": 0.16794191300868988,
465
+ "eval_runtime": 154.6471,
466
+ "eval_samples_per_second": 235.051,
467
+ "eval_steps_per_second": 7.346,
468
+ "eval_wer": 0.25102462205712983,
469
+ "step": 81324
470
+ },
471
+ {
472
+ "epoch": 28.0,
473
+ "grad_norm": 5.650504112243652,
474
+ "learning_rate": 5.091777670462929e-05,
475
+ "loss": 0.586,
476
+ "step": 84336
477
+ },
478
+ {
479
+ "epoch": 28.0,
480
+ "eval_cer": 0.04840666328618075,
481
+ "eval_loss": 0.1722731739282608,
482
+ "eval_runtime": 153.0438,
483
+ "eval_samples_per_second": 237.514,
484
+ "eval_steps_per_second": 7.423,
485
+ "eval_wer": 0.25386416537832335,
486
+ "step": 84336
487
+ },
488
+ {
489
+ "epoch": 29.0,
490
+ "grad_norm": 18.539613723754883,
491
+ "learning_rate": 5.021070599755859e-05,
492
+ "loss": 0.5816,
493
+ "step": 87348
494
+ },
495
+ {
496
+ "epoch": 29.0,
497
+ "eval_cer": 0.04769969548118283,
498
+ "eval_loss": 0.16775010526180267,
499
+ "eval_runtime": 156.0962,
500
+ "eval_samples_per_second": 232.869,
501
+ "eval_steps_per_second": 7.278,
502
+ "eval_wer": 0.25264326550940575,
503
+ "step": 87348
504
+ },
505
+ {
506
+ "epoch": 30.0,
507
+ "grad_norm": 4.343358039855957,
508
+ "learning_rate": 4.950457429541094e-05,
509
+ "loss": 0.5886,
510
+ "step": 90360
511
+ },
512
+ {
513
+ "epoch": 30.0,
514
+ "eval_cer": 0.04993246669411401,
515
+ "eval_loss": 0.18236766755580902,
516
+ "eval_runtime": 151.7294,
517
+ "eval_samples_per_second": 239.571,
518
+ "eval_steps_per_second": 7.487,
519
+ "eval_wer": 0.2629396363586056,
520
+ "step": 90360
521
+ },
522
+ {
523
+ "epoch": 31.0,
524
+ "grad_norm": 13.675621032714844,
525
+ "learning_rate": 4.879773833957101e-05,
526
+ "loss": 0.5978,
527
+ "step": 93372
528
+ },
529
+ {
530
+ "epoch": 31.0,
531
+ "eval_cer": 0.04701886072734242,
532
+ "eval_loss": 0.16201142966747284,
533
+ "eval_runtime": 152.4808,
534
+ "eval_samples_per_second": 238.391,
535
+ "eval_steps_per_second": 7.45,
536
+ "eval_wer": 0.24908778373816712,
537
+ "step": 93372
538
+ },
539
+ {
540
+ "epoch": 32.0,
541
+ "grad_norm": 5.842775821685791,
542
+ "learning_rate": 4.809066763250029e-05,
543
+ "loss": 0.5722,
544
+ "step": 96384
545
+ },
546
+ {
547
+ "epoch": 32.0,
548
+ "eval_cer": 0.04652920987407537,
549
+ "eval_loss": 0.15837915241718292,
550
+ "eval_runtime": 153.9577,
551
+ "eval_samples_per_second": 236.104,
552
+ "eval_steps_per_second": 7.379,
553
+ "eval_wer": 0.24719590773732322,
554
+ "step": 96384
555
+ },
556
+ {
557
+ "epoch": 33.0,
558
+ "grad_norm": 12.179231643676758,
559
+ "learning_rate": 4.738359692542959e-05,
560
+ "loss": 0.5615,
561
+ "step": 99396
562
+ },
563
+ {
564
+ "epoch": 33.0,
565
+ "eval_cer": 0.046122542920097966,
566
+ "eval_loss": 0.15639054775238037,
567
+ "eval_runtime": 165.4748,
568
+ "eval_samples_per_second": 219.671,
569
+ "eval_steps_per_second": 6.865,
570
+ "eval_wer": 0.2421047898703356,
571
+ "step": 99396
572
+ },
573
+ {
574
+ "epoch": 34.0,
575
+ "grad_norm": 11.811565399169922,
576
+ "learning_rate": 4.6676760969589654e-05,
577
+ "loss": 0.5566,
578
+ "step": 102408
579
+ },
580
+ {
581
+ "epoch": 34.0,
582
+ "eval_cer": 0.04475858103950859,
583
+ "eval_loss": 0.15303878486156464,
584
+ "eval_runtime": 151.973,
585
+ "eval_samples_per_second": 239.187,
586
+ "eval_steps_per_second": 7.475,
587
+ "eval_wer": 0.23680615361203053,
588
+ "step": 102408
589
+ },
590
+ {
591
+ "epoch": 35.0,
592
+ "grad_norm": 27.314680099487305,
593
+ "learning_rate": 4.5970394516211246e-05,
594
+ "loss": 0.5514,
595
+ "step": 105420
596
+ },
597
+ {
598
+ "epoch": 35.0,
599
+ "eval_cer": 0.04322590051284967,
600
+ "eval_loss": 0.14988180994987488,
601
+ "eval_runtime": 152.0916,
602
+ "eval_samples_per_second": 239.001,
603
+ "eval_steps_per_second": 7.469,
604
+ "eval_wer": 0.2308780449000626,
605
+ "step": 105420
606
+ },
607
+ {
608
+ "epoch": 36.0,
609
+ "grad_norm": 6.874896049499512,
610
+ "learning_rate": 4.526332380914054e-05,
611
+ "loss": 0.5485,
612
+ "step": 108432
613
+ },
614
+ {
615
+ "epoch": 36.0,
616
+ "eval_cer": 0.043550500516700855,
617
+ "eval_loss": 0.15108104050159454,
618
+ "eval_runtime": 156.7097,
619
+ "eval_samples_per_second": 231.958,
620
+ "eval_steps_per_second": 7.249,
621
+ "eval_wer": 0.23083308258194382,
622
+ "step": 108432
623
+ },
624
+ {
625
+ "epoch": 37.0,
626
+ "grad_norm": 18.988279342651367,
627
+ "learning_rate": 4.4556722604531365e-05,
628
+ "loss": 0.5451,
629
+ "step": 111444
630
+ },
631
+ {
632
+ "epoch": 37.0,
633
+ "eval_cer": 0.04385080136772137,
634
+ "eval_loss": 0.15071320533752441,
635
+ "eval_runtime": 172.1252,
636
+ "eval_samples_per_second": 211.183,
637
+ "eval_steps_per_second": 6.6,
638
+ "eval_wer": 0.23185338133925454,
639
+ "step": 111444
640
+ },
641
+ {
642
+ "epoch": 38.0,
643
+ "grad_norm": 28.172042846679688,
644
+ "learning_rate": 4.384988664869143e-05,
645
+ "loss": 0.5433,
646
+ "step": 114456
647
+ },
648
+ {
649
+ "epoch": 38.0,
650
+ "eval_cer": 0.0433771971248142,
651
+ "eval_loss": 0.14824804663658142,
652
+ "eval_runtime": 152.6042,
653
+ "eval_samples_per_second": 238.198,
654
+ "eval_steps_per_second": 7.444,
655
+ "eval_wer": 0.23122390888559166,
656
+ "step": 114456
657
+ },
658
+ {
659
+ "epoch": 39.0,
660
+ "grad_norm": 4.653916835784912,
661
+ "learning_rate": 4.3143050692851484e-05,
662
+ "loss": 0.5391,
663
+ "step": 117468
664
+ },
665
+ {
666
+ "epoch": 39.0,
667
+ "eval_cer": 0.04353903865215809,
668
+ "eval_loss": 0.14684619009494781,
669
+ "eval_runtime": 156.6046,
670
+ "eval_samples_per_second": 232.113,
671
+ "eval_steps_per_second": 7.254,
672
+ "eval_wer": 0.22910722129415387,
673
+ "step": 117468
674
+ },
675
+ {
676
+ "epoch": 40.0,
677
+ "grad_norm": 5.8400702476501465,
678
+ "learning_rate": 4.243621473701154e-05,
679
+ "loss": 0.5347,
680
+ "step": 120480
681
+ },
682
+ {
683
+ "epoch": 40.0,
684
+ "eval_cer": 0.042970530170836796,
685
+ "eval_loss": 0.1462726891040802,
686
+ "eval_runtime": 178.4468,
687
+ "eval_samples_per_second": 203.702,
688
+ "eval_steps_per_second": 6.366,
689
+ "eval_wer": 0.22744707416361443,
690
+ "step": 120480
691
+ },
692
+ {
693
+ "epoch": 41.0,
694
+ "grad_norm": 16.060422897338867,
695
+ "learning_rate": 4.172914402994084e-05,
696
+ "loss": 0.5313,
697
+ "step": 123492
698
+ },
699
+ {
700
+ "epoch": 41.0,
701
+ "eval_cer": 0.04216269795786252,
702
+ "eval_loss": 0.14503081142902374,
703
+ "eval_runtime": 152.5692,
704
+ "eval_samples_per_second": 238.253,
705
+ "eval_steps_per_second": 7.446,
706
+ "eval_wer": 0.22400226886774507,
707
+ "step": 123492
708
+ },
709
+ {
710
+ "epoch": 42.0,
711
+ "grad_norm": 8.755998611450195,
712
+ "learning_rate": 4.102254282533167e-05,
713
+ "loss": 0.5291,
714
+ "step": 126504
715
+ },
716
+ {
717
+ "epoch": 42.0,
718
+ "eval_cer": 0.04194263015864137,
719
+ "eval_loss": 0.1446152627468109,
720
+ "eval_runtime": 152.4116,
721
+ "eval_samples_per_second": 238.499,
722
+ "eval_steps_per_second": 7.454,
723
+ "eval_wer": 0.224061065745285,
724
+ "step": 126504
725
+ },
726
+ {
727
+ "epoch": 43.0,
728
+ "grad_norm": 9.50841999053955,
729
+ "learning_rate": 4.031570686949173e-05,
730
+ "loss": 0.5269,
731
+ "step": 129516
732
+ },
733
+ {
734
+ "epoch": 43.0,
735
+ "eval_cer": 0.04270048864220919,
736
+ "eval_loss": 0.14530107378959656,
737
+ "eval_runtime": 153.731,
738
+ "eval_samples_per_second": 236.452,
739
+ "eval_steps_per_second": 7.39,
740
+ "eval_wer": 0.22547219080624353,
741
+ "step": 129516
742
+ },
743
+ {
744
+ "epoch": 44.0,
745
+ "grad_norm": 28.027828216552734,
746
+ "learning_rate": 3.960887091365178e-05,
747
+ "loss": 0.5253,
748
+ "step": 132528
749
+ },
750
+ {
751
+ "epoch": 44.0,
752
+ "eval_cer": 0.042549192030244654,
753
+ "eval_loss": 0.14459766447544098,
754
+ "eval_runtime": 179.3972,
755
+ "eval_samples_per_second": 202.623,
756
+ "eval_steps_per_second": 6.332,
757
+ "eval_wer": 0.22531309337290018,
758
+ "step": 132528
759
+ },
760
+ {
761
+ "epoch": 45.0,
762
+ "grad_norm": 9.71485710144043,
763
+ "learning_rate": 3.890226970904261e-05,
764
+ "loss": 0.523,
765
+ "step": 135540
766
+ },
767
+ {
768
+ "epoch": 45.0,
769
+ "eval_cer": 0.041189356420890666,
770
+ "eval_loss": 0.1429988592863083,
771
+ "eval_runtime": 151.6718,
772
+ "eval_samples_per_second": 239.662,
773
+ "eval_steps_per_second": 7.49,
774
+ "eval_wer": 0.22018738910735963,
775
+ "step": 135540
776
+ },
777
+ {
778
+ "epoch": 46.0,
779
+ "grad_norm": 13.113300323486328,
780
+ "learning_rate": 3.819519900197191e-05,
781
+ "loss": 0.5192,
782
+ "step": 138552
783
+ },
784
+ {
785
+ "epoch": 46.0,
786
+ "eval_cer": 0.040866590315366325,
787
+ "eval_loss": 0.14137160778045654,
788
+ "eval_runtime": 152.9576,
789
+ "eval_samples_per_second": 237.648,
790
+ "eval_steps_per_second": 7.427,
791
+ "eval_wer": 0.21718528971296747,
792
+ "step": 138552
793
+ },
794
+ {
795
+ "epoch": 47.0,
796
+ "grad_norm": 15.17225456237793,
797
+ "learning_rate": 3.7488597797362736e-05,
798
+ "loss": 0.518,
799
+ "step": 141564
800
+ },
801
+ {
802
+ "epoch": 47.0,
803
+ "eval_cer": 0.040502561497488015,
804
+ "eval_loss": 0.14037571847438812,
805
+ "eval_runtime": 151.9659,
806
+ "eval_samples_per_second": 239.198,
807
+ "eval_steps_per_second": 7.475,
808
+ "eval_wer": 0.21598168304332638,
809
+ "step": 141564
810
+ },
811
+ {
812
+ "epoch": 48.0,
813
+ "grad_norm": 5.159524917602539,
814
+ "learning_rate": 3.678152709029203e-05,
815
+ "loss": 0.5139,
816
+ "step": 144576
817
+ },
818
+ {
819
+ "epoch": 48.0,
820
+ "eval_cer": 0.04006930301777139,
821
+ "eval_loss": 0.13999390602111816,
822
+ "eval_runtime": 161.9319,
823
+ "eval_samples_per_second": 224.477,
824
+ "eval_steps_per_second": 7.015,
825
+ "eval_wer": 0.2143319118323528,
826
+ "step": 144576
827
+ },
828
+ {
829
+ "epoch": 49.0,
830
+ "grad_norm": 5.24137020111084,
831
+ "learning_rate": 3.6074925885682855e-05,
832
+ "loss": 0.5133,
833
+ "step": 147588
834
+ },
835
+ {
836
+ "epoch": 49.0,
837
+ "eval_cer": 0.04118523014965527,
838
+ "eval_loss": 0.14138683676719666,
839
+ "eval_runtime": 154.3411,
840
+ "eval_samples_per_second": 235.517,
841
+ "eval_steps_per_second": 7.36,
842
+ "eval_wer": 0.21796694232026315,
843
+ "step": 147588
844
+ },
845
+ {
846
+ "epoch": 50.0,
847
+ "grad_norm": 4.531148910522461,
848
+ "learning_rate": 3.5367855178612144e-05,
849
+ "loss": 0.5114,
850
+ "step": 150600
851
+ },
852
+ {
853
+ "epoch": 50.0,
854
+ "eval_cer": 0.04037235471628217,
855
+ "eval_loss": 0.14019279181957245,
856
+ "eval_runtime": 152.1041,
857
+ "eval_samples_per_second": 238.981,
858
+ "eval_steps_per_second": 7.469,
859
+ "eval_wer": 0.21485070781064639,
860
+ "step": 150600
861
+ },
862
+ {
863
+ "epoch": 51.0,
864
+ "grad_norm": 6.761490821838379,
865
+ "learning_rate": 3.4661019222772204e-05,
866
+ "loss": 0.5087,
867
+ "step": 153612
868
+ },
869
+ {
870
+ "epoch": 51.0,
871
+ "eval_cer": 0.04063826997367439,
872
+ "eval_loss": 0.14041763544082642,
873
+ "eval_runtime": 154.2132,
874
+ "eval_samples_per_second": 235.713,
875
+ "eval_steps_per_second": 7.366,
876
+ "eval_wer": 0.21654889997959403,
877
+ "step": 153612
878
+ },
879
+ {
880
+ "epoch": 52.0,
881
+ "grad_norm": 4.379857540130615,
882
+ "learning_rate": 3.395441801816303e-05,
883
+ "loss": 0.5066,
884
+ "step": 156624
885
+ },
886
+ {
887
+ "epoch": 52.0,
888
+ "eval_cer": 0.04042691319150575,
889
+ "eval_loss": 0.13891662657260895,
890
+ "eval_runtime": 159.6511,
891
+ "eval_samples_per_second": 227.684,
892
+ "eval_steps_per_second": 7.116,
893
+ "eval_wer": 0.215715367774469,
894
+ "step": 156624
895
+ },
896
+ {
897
+ "epoch": 53.0,
898
+ "grad_norm": 19.332077026367188,
899
+ "learning_rate": 3.324734731109233e-05,
900
+ "loss": 0.5037,
901
+ "step": 159636
902
+ },
903
+ {
904
+ "epoch": 53.0,
905
+ "eval_cer": 0.03977129453965943,
906
+ "eval_loss": 0.1375364065170288,
907
+ "eval_runtime": 164.6123,
908
+ "eval_samples_per_second": 220.822,
909
+ "eval_steps_per_second": 6.901,
910
+ "eval_wer": 0.2132078538793834,
911
+ "step": 159636
912
+ },
913
+ {
914
+ "epoch": 54.0,
915
+ "grad_norm": 11.953184127807617,
916
+ "learning_rate": 3.254098085771392e-05,
917
+ "loss": 0.5024,
918
+ "step": 162648
919
+ },
920
+ {
921
+ "epoch": 54.0,
922
+ "eval_cer": 0.039842816574406296,
923
+ "eval_loss": 0.13721118867397308,
924
+ "eval_runtime": 156.3278,
925
+ "eval_samples_per_second": 232.524,
926
+ "eval_steps_per_second": 7.267,
927
+ "eval_wer": 0.21213221688438805,
928
+ "step": 162648
929
+ },
930
+ {
931
+ "epoch": 55.0,
932
+ "grad_norm": 7.574815273284912,
933
+ "learning_rate": 3.183391015064322e-05,
934
+ "loss": 0.5,
935
+ "step": 165660
936
+ },
937
+ {
938
+ "epoch": 55.0,
939
+ "eval_cer": 0.04010139623849114,
940
+ "eval_loss": 0.13785392045974731,
941
+ "eval_runtime": 152.1653,
942
+ "eval_samples_per_second": 238.885,
943
+ "eval_steps_per_second": 7.466,
944
+ "eval_wer": 0.2131732674808305,
945
+ "step": 165660
946
+ },
947
+ {
948
+ "epoch": 56.0,
949
+ "grad_norm": 11.841280937194824,
950
+ "learning_rate": 3.112707419480328e-05,
951
+ "loss": 0.4976,
952
+ "step": 168672
953
+ },
954
+ {
955
+ "epoch": 56.0,
956
+ "eval_cer": 0.038647114865304755,
957
+ "eval_loss": 0.13485907018184662,
958
+ "eval_runtime": 153.6401,
959
+ "eval_samples_per_second": 236.592,
960
+ "eval_steps_per_second": 7.394,
961
+ "eval_wer": 0.20721403101016495,
962
+ "step": 168672
963
+ },
964
+ {
965
+ "epoch": 57.0,
966
+ "grad_norm": 14.78765869140625,
967
+ "learning_rate": 3.0420238238963334e-05,
968
+ "loss": 0.4948,
969
+ "step": 171684
970
+ },
971
+ {
972
+ "epoch": 57.0,
973
+ "eval_cer": 0.0392779758897387,
974
+ "eval_loss": 0.13624149560928345,
975
+ "eval_runtime": 152.0309,
976
+ "eval_samples_per_second": 239.096,
977
+ "eval_steps_per_second": 7.472,
978
+ "eval_wer": 0.2102472581632547,
979
+ "step": 171684
980
+ },
981
+ {
982
+ "epoch": 58.0,
983
+ "grad_norm": 7.023338794708252,
984
+ "learning_rate": 2.9713402283123397e-05,
985
+ "loss": 0.4933,
986
+ "step": 174696
987
+ },
988
+ {
989
+ "epoch": 58.0,
990
+ "eval_cer": 0.038942372495926456,
991
+ "eval_loss": 0.13551433384418488,
992
+ "eval_runtime": 151.9877,
993
+ "eval_samples_per_second": 239.164,
994
+ "eval_steps_per_second": 7.474,
995
+ "eval_wer": 0.20676094918912188,
996
+ "step": 174696
997
+ },
998
+ {
999
+ "epoch": 59.0,
1000
+ "grad_norm": 6.568221092224121,
1001
+ "learning_rate": 2.9006801078514222e-05,
1002
+ "loss": 0.4924,
1003
+ "step": 177708
1004
+ },
1005
+ {
1006
+ "epoch": 59.0,
1007
+ "eval_cer": 0.03848756571086942,
1008
+ "eval_loss": 0.13611619174480438,
1009
+ "eval_runtime": 177.5315,
1010
+ "eval_samples_per_second": 204.752,
1011
+ "eval_steps_per_second": 6.399,
1012
+ "eval_wer": 0.20549508700208555,
1013
+ "step": 177708
1014
+ },
1015
+ {
1016
+ "epoch": 60.0,
1017
+ "grad_norm": 23.931304931640625,
1018
+ "learning_rate": 2.8300199873905052e-05,
1019
+ "loss": 0.4901,
1020
+ "step": 180720
1021
+ },
1022
+ {
1023
+ "epoch": 60.0,
1024
+ "eval_cer": 0.03840274791325294,
1025
+ "eval_loss": 0.13464532792568207,
1026
+ "eval_runtime": 153.1916,
1027
+ "eval_samples_per_second": 237.285,
1028
+ "eval_steps_per_second": 7.416,
1029
+ "eval_wer": 0.2053671173274398,
1030
+ "step": 180720
1031
+ },
1032
+ {
1033
+ "epoch": 61.0,
1034
+ "grad_norm": 18.084495544433594,
1035
+ "learning_rate": 2.759312916683434e-05,
1036
+ "loss": 0.4898,
1037
+ "step": 183732
1038
+ },
1039
+ {
1040
+ "epoch": 61.0,
1041
+ "eval_cer": 0.038370654692533195,
1042
+ "eval_loss": 0.13341517746448517,
1043
+ "eval_runtime": 151.6501,
1044
+ "eval_samples_per_second": 239.696,
1045
+ "eval_steps_per_second": 7.491,
1046
+ "eval_wer": 0.2050074187824896,
1047
+ "step": 183732
1048
+ },
1049
+ {
1050
+ "epoch": 62.0,
1051
+ "grad_norm": 2.890596866607666,
1052
+ "learning_rate": 2.6886293210994404e-05,
1053
+ "loss": 0.4873,
1054
+ "step": 186744
1055
+ },
1056
+ {
1057
+ "epoch": 62.0,
1058
+ "eval_cer": 0.038351857234683054,
1059
+ "eval_loss": 0.1341981142759323,
1060
+ "eval_runtime": 150.7747,
1061
+ "eval_samples_per_second": 241.088,
1062
+ "eval_steps_per_second": 7.534,
1063
+ "eval_wer": 0.20600696570066857,
1064
+ "step": 186744
1065
+ },
1066
+ {
1067
+ "epoch": 63.0,
1068
+ "grad_norm": 11.759881973266602,
1069
+ "learning_rate": 2.617969200638523e-05,
1070
+ "loss": 0.4865,
1071
+ "step": 189756
1072
+ },
1073
+ {
1074
+ "epoch": 63.0,
1075
+ "eval_cer": 0.03869296232347583,
1076
+ "eval_loss": 0.13458400964736938,
1077
+ "eval_runtime": 152.6729,
1078
+ "eval_samples_per_second": 238.091,
1079
+ "eval_steps_per_second": 7.441,
1080
+ "eval_wer": 0.20699613669928163,
1081
+ "step": 189756
1082
+ },
1083
+ {
1084
+ "epoch": 64.0,
1085
+ "grad_norm": 13.245360374450684,
1086
+ "learning_rate": 2.547309080177606e-05,
1087
+ "loss": 0.4842,
1088
+ "step": 192768
1089
+ },
1090
+ {
1091
+ "epoch": 64.0,
1092
+ "eval_cer": 0.03874110215455545,
1093
+ "eval_loss": 0.13456492125988007,
1094
+ "eval_runtime": 153.3987,
1095
+ "eval_samples_per_second": 236.964,
1096
+ "eval_steps_per_second": 7.406,
1097
+ "eval_wer": 0.2072278655695861,
1098
+ "step": 192768
1099
+ },
1100
+ {
1101
+ "epoch": 65.0,
1102
+ "grad_norm": 11.684355735778809,
1103
+ "learning_rate": 2.4766020094705352e-05,
1104
+ "loss": 0.4822,
1105
+ "step": 195780
1106
+ },
1107
+ {
1108
+ "epoch": 65.0,
1109
+ "eval_cer": 0.03811941062175572,
1110
+ "eval_loss": 0.13252592086791992,
1111
+ "eval_runtime": 156.7414,
1112
+ "eval_samples_per_second": 231.911,
1113
+ "eval_steps_per_second": 7.248,
1114
+ "eval_wer": 0.20395599226648128,
1115
+ "step": 195780
1116
+ },
1117
+ {
1118
+ "epoch": 66.0,
1119
+ "grad_norm": 25.19974708557129,
1120
+ "learning_rate": 2.405918413886541e-05,
1121
+ "loss": 0.4814,
1122
+ "step": 198792
1123
+ },
1124
+ {
1125
+ "epoch": 66.0,
1126
+ "eval_cer": 0.037090135185815165,
1127
+ "eval_loss": 0.13119570910930634,
1128
+ "eval_runtime": 154.348,
1129
+ "eval_samples_per_second": 235.507,
1130
+ "eval_steps_per_second": 7.36,
1131
+ "eval_wer": 0.19890983671761242,
1132
+ "step": 198792
1133
+ },
1134
+ {
1135
+ "epoch": 67.0,
1136
+ "grad_norm": 12.580814361572266,
1137
+ "learning_rate": 2.335234818302547e-05,
1138
+ "loss": 0.4796,
1139
+ "step": 201804
1140
+ },
1141
+ {
1142
+ "epoch": 67.0,
1143
+ "eval_cer": 0.03740739959635898,
1144
+ "eval_loss": 0.13117973506450653,
1145
+ "eval_runtime": 162.9804,
1146
+ "eval_samples_per_second": 223.033,
1147
+ "eval_steps_per_second": 6.97,
1148
+ "eval_wer": 0.1999750977930419,
1149
+ "step": 201804
1150
+ },
1151
+ {
1152
+ "epoch": 68.0,
1153
+ "grad_norm": 11.110360145568848,
1154
+ "learning_rate": 2.2645746978416297e-05,
1155
+ "loss": 0.4771,
1156
+ "step": 204816
1157
+ },
1158
+ {
1159
+ "epoch": 68.0,
1160
+ "eval_cer": 0.037213006373713636,
1161
+ "eval_loss": 0.1303921490907669,
1162
+ "eval_runtime": 152.708,
1163
+ "eval_samples_per_second": 238.036,
1164
+ "eval_steps_per_second": 7.439,
1165
+ "eval_wer": 0.1997191584437504,
1166
+ "step": 204816
1167
+ },
1168
+ {
1169
+ "epoch": 69.0,
1170
+ "grad_norm": 4.782271385192871,
1171
+ "learning_rate": 2.193891102257636e-05,
1172
+ "loss": 0.4756,
1173
+ "step": 207828
1174
+ },
1175
+ {
1176
+ "epoch": 69.0,
1177
+ "eval_cer": 0.037708617396542916,
1178
+ "eval_loss": 0.13083402812480927,
1179
+ "eval_runtime": 152.8061,
1180
+ "eval_samples_per_second": 237.883,
1181
+ "eval_steps_per_second": 7.434,
1182
+ "eval_wer": 0.20086396823585156,
1183
+ "step": 207828
1184
+ },
1185
+ {
1186
+ "epoch": 70.0,
1187
+ "grad_norm": 7.983453750610352,
1188
+ "learning_rate": 2.1232544569197956e-05,
1189
+ "loss": 0.4745,
1190
+ "step": 210840
1191
+ },
1192
+ {
1193
+ "epoch": 70.0,
1194
+ "eval_cer": 0.0370488724734612,
1195
+ "eval_loss": 0.13116249442100525,
1196
+ "eval_runtime": 151.9447,
1197
+ "eval_samples_per_second": 239.232,
1198
+ "eval_steps_per_second": 7.476,
1199
+ "eval_wer": 0.19823886058568607,
1200
+ "step": 210840
1201
+ },
1202
+ {
1203
+ "epoch": 71.0,
1204
+ "grad_norm": 23.63794708251953,
1205
+ "learning_rate": 2.052547386212725e-05,
1206
+ "loss": 0.4738,
1207
+ "step": 213852
1208
+ },
1209
+ {
1210
+ "epoch": 71.0,
1211
+ "eval_cer": 0.037366136884005016,
1212
+ "eval_loss": 0.1306936889886856,
1213
+ "eval_runtime": 154.0224,
1214
+ "eval_samples_per_second": 236.005,
1215
+ "eval_steps_per_second": 7.376,
1216
+ "eval_wer": 0.20006848106913475,
1217
+ "step": 213852
1218
+ },
1219
+ {
1220
+ "epoch": 72.0,
1221
+ "grad_norm": 10.838956832885742,
1222
+ "learning_rate": 1.9818637906287305e-05,
1223
+ "loss": 0.473,
1224
+ "step": 216864
1225
+ },
1226
+ {
1227
+ "epoch": 72.0,
1228
+ "eval_cer": 0.0372285945094918,
1229
+ "eval_loss": 0.13071005046367645,
1230
+ "eval_runtime": 154.8642,
1231
+ "eval_samples_per_second": 234.722,
1232
+ "eval_steps_per_second": 7.335,
1233
+ "eval_wer": 0.19911043782921928,
1234
+ "step": 216864
1235
+ },
1236
+ {
1237
+ "epoch": 73.0,
1238
+ "grad_norm": 11.969744682312012,
1239
+ "learning_rate": 1.9111801950447367e-05,
1240
+ "loss": 0.472,
1241
+ "step": 219876
1242
+ },
1243
+ {
1244
+ "epoch": 73.0,
1245
+ "eval_cer": 0.03662890975661418,
1246
+ "eval_loss": 0.12924158573150635,
1247
+ "eval_runtime": 154.3055,
1248
+ "eval_samples_per_second": 235.572,
1249
+ "eval_steps_per_second": 7.362,
1250
+ "eval_wer": 0.19607375203627422,
1251
+ "step": 219876
1252
+ },
1253
+ {
1254
+ "epoch": 74.0,
1255
+ "grad_norm": 6.115599155426025,
1256
+ "learning_rate": 1.840496599460743e-05,
1257
+ "loss": 0.4693,
1258
+ "step": 222888
1259
+ },
1260
+ {
1261
+ "epoch": 74.0,
1262
+ "eval_cer": 0.036412051279465014,
1263
+ "eval_loss": 0.12866230309009552,
1264
+ "eval_runtime": 157.4725,
1265
+ "eval_samples_per_second": 230.834,
1266
+ "eval_steps_per_second": 7.214,
1267
+ "eval_wer": 0.19521600935216216,
1268
+ "step": 222888
1269
+ },
1270
+ {
1271
+ "epoch": 75.0,
1272
+ "grad_norm": 44.51272964477539,
1273
+ "learning_rate": 1.7698130038767486e-05,
1274
+ "loss": 0.4693,
1275
+ "step": 225900
1276
+ },
1277
+ {
1278
+ "epoch": 75.0,
1279
+ "eval_cer": 0.03628138602367746,
1280
+ "eval_loss": 0.12844808399677277,
1281
+ "eval_runtime": 164.6927,
1282
+ "eval_samples_per_second": 220.714,
1283
+ "eval_steps_per_second": 6.898,
1284
+ "eval_wer": 0.1944724017832747,
1285
+ "step": 225900
1286
+ },
1287
+ {
1288
+ "epoch": 76.0,
1289
+ "grad_norm": 9.162590026855469,
1290
+ "learning_rate": 1.6991528834158312e-05,
1291
+ "loss": 0.4664,
1292
+ "step": 228912
1293
+ },
1294
+ {
1295
+ "epoch": 76.0,
1296
+ "eval_cer": 0.03683797416587427,
1297
+ "eval_loss": 0.12876588106155396,
1298
+ "eval_runtime": 152.2551,
1299
+ "eval_samples_per_second": 238.744,
1300
+ "eval_steps_per_second": 7.461,
1301
+ "eval_wer": 0.19688999104212276,
1302
+ "step": 228912
1303
+ },
1304
+ {
1305
+ "epoch": 77.0,
1306
+ "grad_norm": 3.896597146987915,
1307
+ "learning_rate": 1.6284692878318375e-05,
1308
+ "loss": 0.4651,
1309
+ "step": 231924
1310
+ },
1311
+ {
1312
+ "epoch": 77.0,
1313
+ "eval_cer": 0.03683384789463887,
1314
+ "eval_loss": 0.12869854271411896,
1315
+ "eval_runtime": 168.0203,
1316
+ "eval_samples_per_second": 216.343,
1317
+ "eval_steps_per_second": 6.761,
1318
+ "eval_wer": 0.1971009680732955,
1319
+ "step": 231924
1320
+ },
1321
+ {
1322
+ "epoch": 78.0,
1323
+ "grad_norm": 15.388148307800293,
1324
+ "learning_rate": 1.5577856922478434e-05,
1325
+ "loss": 0.4641,
1326
+ "step": 234936
1327
+ },
1328
+ {
1329
+ "epoch": 78.0,
1330
+ "eval_cer": 0.03656747416266495,
1331
+ "eval_loss": 0.1286703646183014,
1332
+ "eval_runtime": 154.3849,
1333
+ "eval_samples_per_second": 235.45,
1334
+ "eval_steps_per_second": 7.358,
1335
+ "eval_wer": 0.1952090920724516,
1336
+ "step": 234936
1337
+ },
1338
+ {
1339
+ "epoch": 79.0,
1340
+ "grad_norm": 2.635507345199585,
1341
+ "learning_rate": 1.4871020966638496e-05,
1342
+ "loss": 0.462,
1343
+ "step": 237948
1344
+ },
1345
+ {
1346
+ "epoch": 79.0,
1347
+ "eval_cer": 0.03641342670321015,
1348
+ "eval_loss": 0.12868022918701172,
1349
+ "eval_runtime": 152.2309,
1350
+ "eval_samples_per_second": 238.782,
1351
+ "eval_steps_per_second": 7.462,
1352
+ "eval_wer": 0.19447586042313,
1353
+ "step": 237948
1354
+ },
1355
+ {
1356
+ "epoch": 80.0,
1357
+ "grad_norm": 5.287237167358398,
1358
+ "learning_rate": 1.4163950259567787e-05,
1359
+ "loss": 0.4608,
1360
+ "step": 240960
1361
+ },
1362
+ {
1363
+ "epoch": 80.0,
1364
+ "eval_cer": 0.036332276702247354,
1365
+ "eval_loss": 0.12745150923728943,
1366
+ "eval_runtime": 154.8487,
1367
+ "eval_samples_per_second": 234.745,
1368
+ "eval_steps_per_second": 7.336,
1369
+ "eval_wer": 0.19517796431375398,
1370
+ "step": 240960
1371
+ },
1372
+ {
1373
+ "epoch": 81.0,
1374
+ "grad_norm": 11.394750595092773,
1375
+ "learning_rate": 1.3457349054958616e-05,
1376
+ "loss": 0.4594,
1377
+ "step": 243972
1378
+ },
1379
+ {
1380
+ "epoch": 81.0,
1381
+ "eval_cer": 0.0360989131401566,
1382
+ "eval_loss": 0.12770119309425354,
1383
+ "eval_runtime": 152.6766,
1384
+ "eval_samples_per_second": 238.085,
1385
+ "eval_steps_per_second": 7.441,
1386
+ "eval_wer": 0.19389480892744118,
1387
+ "step": 243972
1388
+ },
1389
+ {
1390
+ "epoch": 82.0,
1391
+ "grad_norm": 9.252601623535156,
1392
+ "learning_rate": 1.2750278347887909e-05,
1393
+ "loss": 0.4595,
1394
+ "step": 246984
1395
+ },
1396
+ {
1397
+ "epoch": 82.0,
1398
+ "eval_cer": 0.03594715805361035,
1399
+ "eval_loss": 0.12681059539318085,
1400
+ "eval_runtime": 152.0339,
1401
+ "eval_samples_per_second": 239.091,
1402
+ "eval_steps_per_second": 7.472,
1403
+ "eval_wer": 0.19371841829482137,
1404
+ "step": 246984
1405
+ },
1406
+ {
1407
+ "epoch": 83.0,
1408
+ "grad_norm": 16.776588439941406,
1409
+ "learning_rate": 1.2043677143278735e-05,
1410
+ "loss": 0.4575,
1411
+ "step": 249996
1412
+ },
1413
+ {
1414
+ "epoch": 83.0,
1415
+ "eval_cer": 0.03615393008996188,
1416
+ "eval_loss": 0.12722131609916687,
1417
+ "eval_runtime": 152.0731,
1418
+ "eval_samples_per_second": 239.03,
1419
+ "eval_steps_per_second": 7.47,
1420
+ "eval_wer": 0.1942475901926808,
1421
+ "step": 249996
1422
+ },
1423
+ {
1424
+ "epoch": 84.0,
1425
+ "grad_norm": 8.937053680419922,
1426
+ "learning_rate": 1.1336606436208028e-05,
1427
+ "loss": 0.4569,
1428
+ "step": 253008
1429
+ },
1430
+ {
1431
+ "epoch": 84.0,
1432
+ "eval_cer": 0.03605260720740382,
1433
+ "eval_loss": 0.12680456042289734,
1434
+ "eval_runtime": 152.9563,
1435
+ "eval_samples_per_second": 237.65,
1436
+ "eval_steps_per_second": 7.427,
1437
+ "eval_wer": 0.19341059934770052,
1438
+ "step": 253008
1439
+ },
1440
+ {
1441
+ "epoch": 85.0,
1442
+ "grad_norm": 13.364398956298828,
1443
+ "learning_rate": 1.0630005231598857e-05,
1444
+ "loss": 0.4552,
1445
+ "step": 256020
1446
+ },
1447
+ {
1448
+ "epoch": 85.0,
1449
+ "eval_cer": 0.035670697880838785,
1450
+ "eval_loss": 0.12619073688983917,
1451
+ "eval_runtime": 156.3815,
1452
+ "eval_samples_per_second": 232.444,
1453
+ "eval_steps_per_second": 7.264,
1454
+ "eval_wer": 0.1916293998222259,
1455
+ "step": 256020
1456
+ },
1457
+ {
1458
+ "epoch": 86.0,
1459
+ "grad_norm": 9.33234691619873,
1460
+ "learning_rate": 9.923404026989685e-06,
1461
+ "loss": 0.4538,
1462
+ "step": 259032
1463
+ },
1464
+ {
1465
+ "epoch": 86.0,
1466
+ "eval_cer": 0.03549051737022648,
1467
+ "eval_loss": 0.12592804431915283,
1468
+ "eval_runtime": 156.5096,
1469
+ "eval_samples_per_second": 232.254,
1470
+ "eval_steps_per_second": 7.258,
1471
+ "eval_wer": 0.19070940162071864,
1472
+ "step": 259032
1473
+ },
1474
+ {
1475
+ "epoch": 87.0,
1476
+ "grad_norm": 4.286988735198975,
1477
+ "learning_rate": 9.216568071149744e-06,
1478
+ "loss": 0.4532,
1479
+ "step": 262044
1480
+ },
1481
+ {
1482
+ "epoch": 87.0,
1483
+ "eval_cer": 0.03551573347222057,
1484
+ "eval_loss": 0.12575581669807434,
1485
+ "eval_runtime": 155.6329,
1486
+ "eval_samples_per_second": 233.562,
1487
+ "eval_steps_per_second": 7.299,
1488
+ "eval_wer": 0.19122473895915693,
1489
+ "step": 262044
1490
+ },
1491
+ {
1492
+ "epoch": 88.0,
1493
+ "grad_norm": 7.920403957366943,
1494
+ "learning_rate": 8.509732115309804e-06,
1495
+ "loss": 0.4524,
1496
+ "step": 265056
1497
+ },
1498
+ {
1499
+ "epoch": 88.0,
1500
+ "eval_cer": 0.03555103601501229,
1501
+ "eval_loss": 0.1259673833847046,
1502
+ "eval_runtime": 166.3365,
1503
+ "eval_samples_per_second": 218.533,
1504
+ "eval_steps_per_second": 6.83,
1505
+ "eval_wer": 0.19095150641058897,
1506
+ "step": 265056
1507
+ },
1508
+ {
1509
+ "epoch": 89.0,
1510
+ "grad_norm": 9.81010913848877,
1511
+ "learning_rate": 7.802896159469865e-06,
1512
+ "loss": 0.4501,
1513
+ "step": 268068
1514
+ },
1515
+ {
1516
+ "epoch": 89.0,
1517
+ "eval_cer": 0.03596458008771536,
1518
+ "eval_loss": 0.12655647099018097,
1519
+ "eval_runtime": 155.3246,
1520
+ "eval_samples_per_second": 234.026,
1521
+ "eval_steps_per_second": 7.314,
1522
+ "eval_wer": 0.19276729233461648,
1523
+ "step": 268068
1524
+ },
1525
+ {
1526
+ "epoch": 90.0,
1527
+ "grad_norm": 10.394911766052246,
1528
+ "learning_rate": 7.096060203629924e-06,
1529
+ "loss": 0.4491,
1530
+ "step": 271080
1531
+ },
1532
+ {
1533
+ "epoch": 90.0,
1534
+ "eval_cer": 0.03546667669197752,
1535
+ "eval_loss": 0.12519720196723938,
1536
+ "eval_runtime": 177.0207,
1537
+ "eval_samples_per_second": 205.343,
1538
+ "eval_steps_per_second": 6.417,
1539
+ "eval_wer": 0.19042579315258482,
1540
+ "step": 271080
1541
+ },
1542
+ {
1543
+ "epoch": 91.0,
1544
+ "grad_norm": 9.381885528564453,
1545
+ "learning_rate": 6.3892242477899846e-06,
1546
+ "loss": 0.4486,
1547
+ "step": 274092
1548
+ },
1549
+ {
1550
+ "epoch": 91.0,
1551
+ "eval_cer": 0.03518288092589859,
1552
+ "eval_loss": 0.12525735795497894,
1553
+ "eval_runtime": 173.742,
1554
+ "eval_samples_per_second": 209.218,
1555
+ "eval_steps_per_second": 6.538,
1556
+ "eval_wer": 0.18893166073509932,
1557
+ "step": 274092
1558
+ },
1559
+ {
1560
+ "epoch": 92.0,
1561
+ "grad_norm": 26.816091537475586,
1562
+ "learning_rate": 5.682623043180811e-06,
1563
+ "loss": 0.4487,
1564
+ "step": 277104
1565
+ },
1566
+ {
1567
+ "epoch": 92.0,
1568
+ "eval_cer": 0.035369480080654846,
1569
+ "eval_loss": 0.12525933980941772,
1570
+ "eval_runtime": 153.4339,
1571
+ "eval_samples_per_second": 236.91,
1572
+ "eval_steps_per_second": 7.404,
1573
+ "eval_wer": 0.190249402519965,
1574
+ "step": 277104
1575
+ },
1576
+ {
1577
+ "epoch": 93.0,
1578
+ "grad_norm": 12.131733894348145,
1579
+ "learning_rate": 4.975552336110105e-06,
1580
+ "loss": 0.4471,
1581
+ "step": 280116
1582
+ },
1583
+ {
1584
+ "epoch": 93.0,
1585
+ "eval_cer": 0.03521634957036347,
1586
+ "eval_loss": 0.1251526027917862,
1587
+ "eval_runtime": 176.0621,
1588
+ "eval_samples_per_second": 206.461,
1589
+ "eval_steps_per_second": 6.452,
1590
+ "eval_wer": 0.1893743666365765,
1591
+ "step": 280116
1592
+ },
1593
+ {
1594
+ "epoch": 94.0,
1595
+ "grad_norm": 14.601805686950684,
1596
+ "learning_rate": 4.2687163802701646e-06,
1597
+ "loss": 0.4458,
1598
+ "step": 283128
1599
+ },
1600
+ {
1601
+ "epoch": 94.0,
1602
+ "eval_cer": 0.0351705021121924,
1603
+ "eval_loss": 0.1253127008676529,
1604
+ "eval_runtime": 177.0148,
1605
+ "eval_samples_per_second": 205.35,
1606
+ "eval_steps_per_second": 6.418,
1607
+ "eval_wer": 0.18914263776627205,
1608
+ "step": 283128
1609
+ },
1610
+ {
1611
+ "epoch": 95.0,
1612
+ "grad_norm": 3.8078722953796387,
1613
+ "learning_rate": 3.562115175660992e-06,
1614
+ "loss": 0.4449,
1615
+ "step": 286140
1616
+ },
1617
+ {
1618
+ "epoch": 95.0,
1619
+ "eval_cer": 0.035079265670431965,
1620
+ "eval_loss": 0.12475291639566422,
1621
+ "eval_runtime": 168.7285,
1622
+ "eval_samples_per_second": 215.435,
1623
+ "eval_steps_per_second": 6.733,
1624
+ "eval_wer": 0.1884266993162269,
1625
+ "step": 286140
1626
+ },
1627
+ {
1628
+ "epoch": 96.0,
1629
+ "grad_norm": 4.562527656555176,
1630
+ "learning_rate": 2.855044468590285e-06,
1631
+ "loss": 0.4434,
1632
+ "step": 289152
1633
+ },
1634
+ {
1635
+ "epoch": 96.0,
1636
+ "eval_cer": 0.035126030077766456,
1637
+ "eval_loss": 0.1246921494603157,
1638
+ "eval_runtime": 179.6539,
1639
+ "eval_samples_per_second": 202.333,
1640
+ "eval_steps_per_second": 6.323,
1641
+ "eval_wer": 0.18909421680829797,
1642
+ "step": 289152
1643
+ },
1644
+ {
1645
+ "epoch": 97.0,
1646
+ "grad_norm": 8.47780990600586,
1647
+ "learning_rate": 2.1482085127503454e-06,
1648
+ "loss": 0.4435,
1649
+ "step": 292164
1650
+ },
1651
+ {
1652
+ "epoch": 97.0,
1653
+ "eval_cer": 0.035188841095460825,
1654
+ "eval_loss": 0.12471602112054825,
1655
+ "eval_runtime": 152.6845,
1656
+ "eval_samples_per_second": 238.073,
1657
+ "eval_steps_per_second": 7.44,
1658
+ "eval_wer": 0.18912880320685088,
1659
+ "step": 292164
1660
+ },
1661
+ {
1662
+ "epoch": 98.0,
1663
+ "grad_norm": 16.65981101989746,
1664
+ "learning_rate": 1.4413725569104053e-06,
1665
+ "loss": 0.4444,
1666
+ "step": 295176
1667
+ },
1668
+ {
1669
+ "epoch": 98.0,
1670
+ "eval_cer": 0.03511594363696882,
1671
+ "eval_loss": 0.12448572367429733,
1672
+ "eval_runtime": 159.0007,
1673
+ "eval_samples_per_second": 228.615,
1674
+ "eval_steps_per_second": 7.145,
1675
+ "eval_wer": 0.1887518114626242,
1676
+ "step": 295176
1677
+ },
1678
+ {
1679
+ "epoch": 99.0,
1680
+ "grad_norm": 10.248809814453125,
1681
+ "learning_rate": 7.345366010704655e-07,
1682
+ "loss": 0.4429,
1683
+ "step": 298188
1684
+ },
1685
+ {
1686
+ "epoch": 99.0,
1687
+ "eval_cer": 0.03512648855234817,
1688
+ "eval_loss": 0.12444119900465012,
1689
+ "eval_runtime": 155.6325,
1690
+ "eval_samples_per_second": 233.563,
1691
+ "eval_steps_per_second": 7.299,
1692
+ "eval_wer": 0.18870684914450545,
1693
+ "step": 298188
1694
+ }
1695
+ ],
1696
+ "logging_steps": 500,
1697
+ "max_steps": 301200,
1698
+ "num_input_tokens_seen": 0,
1699
+ "num_train_epochs": 100,
1700
+ "save_steps": 500,
1701
+ "stateful_callbacks": {
1702
+ "EarlyStoppingCallback": {
1703
+ "args": {
1704
+ "early_stopping_patience": 10,
1705
+ "early_stopping_threshold": 0.001
1706
+ },
1707
+ "attributes": {
1708
+ "early_stopping_patience_counter": 8
1709
+ }
1710
+ },
1711
+ "TrainerControl": {
1712
+ "args": {
1713
+ "should_epoch_stop": false,
1714
+ "should_evaluate": false,
1715
+ "should_log": false,
1716
+ "should_save": true,
1717
+ "should_training_stop": false
1718
+ },
1719
+ "attributes": {}
1720
+ }
1721
+ },
1722
+ "total_flos": 1.0954167922548843e+21,
1723
+ "train_batch_size": 64,
1724
+ "trial_name": null,
1725
+ "trial_params": null
1726
+ }
checkpoint-298188/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb00c22830dc27d32027f8d7edee3b94a76e24ae2a1d322a9afad8ebb4846cd3
3
+ size 5841
checkpoint-301200/config.json ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_dropout": 0.0,
3
+ "adapter_attn_dim": null,
4
+ "adapter_kernel_size": 3,
5
+ "adapter_stride": 2,
6
+ "add_adapter": false,
7
+ "apply_spec_augment": true,
8
+ "architectures": [
9
+ "Wav2Vec2ForCTC"
10
+ ],
11
+ "attention_dropout": 0.0,
12
+ "bos_token_id": 1,
13
+ "classifier_proj_size": 256,
14
+ "codevector_dim": 256,
15
+ "contrastive_logits_temperature": 0.1,
16
+ "conv_bias": true,
17
+ "conv_dim": [
18
+ 512,
19
+ 512,
20
+ 512,
21
+ 512,
22
+ 512,
23
+ 512,
24
+ 512
25
+ ],
26
+ "conv_kernel": [
27
+ 10,
28
+ 3,
29
+ 3,
30
+ 3,
31
+ 3,
32
+ 2,
33
+ 2
34
+ ],
35
+ "conv_stride": [
36
+ 5,
37
+ 2,
38
+ 2,
39
+ 2,
40
+ 2,
41
+ 2,
42
+ 2
43
+ ],
44
+ "ctc_loss_reduction": "mean",
45
+ "ctc_zero_infinity": false,
46
+ "diversity_loss_weight": 0.1,
47
+ "do_stable_layer_norm": true,
48
+ "eos_token_id": 2,
49
+ "feat_extract_activation": "gelu",
50
+ "feat_extract_dropout": 0.0,
51
+ "feat_extract_norm": "layer",
52
+ "feat_proj_dropout": 0.0,
53
+ "feat_quantizer_dropout": 0.0,
54
+ "final_dropout": 0.0,
55
+ "hidden_act": "gelu",
56
+ "hidden_dropout": 0.0,
57
+ "hidden_dropout_prob": 0.0,
58
+ "hidden_size": 768,
59
+ "initializer_range": 0.02,
60
+ "intermediate_size": 3072,
61
+ "layer_norm_eps": 1e-05,
62
+ "layerdrop": 0.0,
63
+ "mask_feature_length": 10,
64
+ "mask_feature_min_masks": 0,
65
+ "mask_feature_prob": 0.0,
66
+ "mask_time_length": 10,
67
+ "mask_time_min_masks": 2,
68
+ "mask_time_prob": 0.65,
69
+ "model_type": "wav2vec2",
70
+ "num_adapter_layers": 3,
71
+ "num_attention_heads": 12,
72
+ "num_codevector_groups": 2,
73
+ "num_codevectors_per_group": 320,
74
+ "num_conv_pos_embedding_groups": 16,
75
+ "num_conv_pos_embeddings": 128,
76
+ "num_feat_extract_layers": 7,
77
+ "num_hidden_layers": 12,
78
+ "num_negatives": 100,
79
+ "output_hidden_size": 768,
80
+ "pad_token_id": 28,
81
+ "proj_codevector_dim": 256,
82
+ "tdnn_dilation": [
83
+ 1,
84
+ 2,
85
+ 3,
86
+ 1,
87
+ 1
88
+ ],
89
+ "tdnn_dim": [
90
+ 512,
91
+ 512,
92
+ 512,
93
+ 512,
94
+ 1500
95
+ ],
96
+ "tdnn_kernel": [
97
+ 5,
98
+ 3,
99
+ 3,
100
+ 1,
101
+ 1
102
+ ],
103
+ "torch_dtype": "float32",
104
+ "transformers_version": "4.51.3",
105
+ "use_weighted_layer_sum": false,
106
+ "vocab_size": 32,
107
+ "xvector_output_dim": 512
108
+ }
checkpoint-301200/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c2dc0ddc9facfeee4821b46da0115ab606db65b8b0961efa7222decd8a9a1b8f
3
+ size 377652400
checkpoint-301200/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:94f98a76825a5bfa275194a9bb12add8d2618c18bb45d34ef4fa748d87f9e65b
3
+ size 755442827
checkpoint-301200/preprocessor_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_normalize": true,
3
+ "feature_extractor_type": "Wav2Vec2FeatureExtractor",
4
+ "feature_size": 1,
5
+ "padding_side": "right",
6
+ "padding_value": 0.0,
7
+ "return_attention_mask": false,
8
+ "sampling_rate": 16000
9
+ }
checkpoint-301200/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9e25a117aee5e36f884d7da3865bc327904b725e867f9e86bbcb36e725a7638
3
+ size 14709
checkpoint-301200/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d87d8958a9902e004bf6e8cbb563a5675c7877505a81f2b44f10360d2af06ad8
3
+ size 1383
checkpoint-301200/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1bfcdfd49b4e9acb0437e449d5a650d09789eaffff1daea5f8d4dd2031cbec0f
3
+ size 1465
checkpoint-301200/trainer_state.json ADDED
@@ -0,0 +1,1743 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 301200,
3
+ "best_metric": 0.1883851956379634,
4
+ "best_model_checkpoint": "wav2vec2-asr-africa-base-fintuned-luganda-400hrs-v0.1/checkpoint-301200",
5
+ "epoch": 100.0,
6
+ "eval_steps": 500,
7
+ "global_step": 301200,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 1.0,
14
+ "grad_norm": 7.066422462463379,
15
+ "learning_rate": 6.98140770252324e-05,
16
+ "loss": 3.2575,
17
+ "step": 3012
18
+ },
19
+ {
20
+ "epoch": 1.0,
21
+ "eval_cer": 0.10930492502565166,
22
+ "eval_loss": 0.37255266308784485,
23
+ "eval_runtime": 151.5408,
24
+ "eval_samples_per_second": 239.869,
25
+ "eval_steps_per_second": 7.496,
26
+ "eval_wer": 0.534090083733671,
27
+ "step": 3012
28
+ },
29
+ {
30
+ "epoch": 2.0,
31
+ "grad_norm": 15.221166610717773,
32
+ "learning_rate": 6.929480730277542e-05,
33
+ "loss": 0.8396,
34
+ "step": 6024
35
+ },
36
+ {
37
+ "epoch": 2.0,
38
+ "eval_cer": 0.07810481279107405,
39
+ "eval_loss": 0.2538328468799591,
40
+ "eval_runtime": 152.0166,
41
+ "eval_samples_per_second": 239.119,
42
+ "eval_steps_per_second": 7.473,
43
+ "eval_wer": 0.3983557626127949,
44
+ "step": 6024
45
+ },
46
+ {
47
+ "epoch": 3.0,
48
+ "grad_norm": 7.060754299163818,
49
+ "learning_rate": 6.858773659570472e-05,
50
+ "loss": 0.7487,
51
+ "step": 9036
52
+ },
53
+ {
54
+ "epoch": 3.0,
55
+ "eval_cer": 0.0682448584368034,
56
+ "eval_loss": 0.22769133746623993,
57
+ "eval_runtime": 153.0969,
58
+ "eval_samples_per_second": 237.431,
59
+ "eval_steps_per_second": 7.42,
60
+ "eval_wer": 0.35308562554689743,
61
+ "step": 9036
62
+ },
63
+ {
64
+ "epoch": 4.0,
65
+ "grad_norm": 9.472733497619629,
66
+ "learning_rate": 6.788090063986478e-05,
67
+ "loss": 0.7226,
68
+ "step": 12048
69
+ },
70
+ {
71
+ "epoch": 4.0,
72
+ "eval_cer": 0.06425062788093966,
73
+ "eval_loss": 0.21326717734336853,
74
+ "eval_runtime": 171.3308,
75
+ "eval_samples_per_second": 212.163,
76
+ "eval_steps_per_second": 6.63,
77
+ "eval_wer": 0.33642189872410777,
78
+ "step": 12048
79
+ },
80
+ {
81
+ "epoch": 5.0,
82
+ "grad_norm": 23.48148536682129,
83
+ "learning_rate": 6.717429943525561e-05,
84
+ "loss": 0.7096,
85
+ "step": 15060
86
+ },
87
+ {
88
+ "epoch": 5.0,
89
+ "eval_cer": 0.06122744648913919,
90
+ "eval_loss": 0.20842401683330536,
91
+ "eval_runtime": 174.4012,
92
+ "eval_samples_per_second": 208.427,
93
+ "eval_steps_per_second": 6.514,
94
+ "eval_wer": 0.3211001241651708,
95
+ "step": 15060
96
+ },
97
+ {
98
+ "epoch": 6.0,
99
+ "grad_norm": 7.9525041580200195,
100
+ "learning_rate": 6.646746347941567e-05,
101
+ "loss": 0.6979,
102
+ "step": 18072
103
+ },
104
+ {
105
+ "epoch": 6.0,
106
+ "eval_cer": 0.06395399482657282,
107
+ "eval_loss": 0.20914477109909058,
108
+ "eval_runtime": 153.4786,
109
+ "eval_samples_per_second": 236.841,
110
+ "eval_steps_per_second": 7.402,
111
+ "eval_wer": 0.3287402596055075,
112
+ "step": 18072
113
+ },
114
+ {
115
+ "epoch": 7.0,
116
+ "grad_norm": 43.115169525146484,
117
+ "learning_rate": 6.57608622748065e-05,
118
+ "loss": 0.6899,
119
+ "step": 21084
120
+ },
121
+ {
122
+ "epoch": 7.0,
123
+ "eval_cer": 0.06076026089037598,
124
+ "eval_loss": 0.20185638964176178,
125
+ "eval_runtime": 173.3256,
126
+ "eval_samples_per_second": 209.721,
127
+ "eval_steps_per_second": 6.554,
128
+ "eval_wer": 0.31623727652863237,
129
+ "step": 21084
130
+ },
131
+ {
132
+ "epoch": 8.0,
133
+ "grad_norm": 4.6848602294921875,
134
+ "learning_rate": 6.505426107019733e-05,
135
+ "loss": 0.6765,
136
+ "step": 24096
137
+ },
138
+ {
139
+ "epoch": 8.0,
140
+ "eval_cer": 0.060137193933831115,
141
+ "eval_loss": 0.19728189706802368,
142
+ "eval_runtime": 154.1162,
143
+ "eval_samples_per_second": 235.861,
144
+ "eval_steps_per_second": 7.371,
145
+ "eval_wer": 0.31059969356450884,
146
+ "step": 24096
147
+ },
148
+ {
149
+ "epoch": 9.0,
150
+ "grad_norm": 5.830124378204346,
151
+ "learning_rate": 6.434719036312661e-05,
152
+ "loss": 0.6701,
153
+ "step": 27108
154
+ },
155
+ {
156
+ "epoch": 9.0,
157
+ "eval_cer": 0.05820976679231927,
158
+ "eval_loss": 0.19281432032585144,
159
+ "eval_runtime": 156.5256,
160
+ "eval_samples_per_second": 232.23,
161
+ "eval_steps_per_second": 7.258,
162
+ "eval_wer": 0.30471654717065966,
163
+ "step": 27108
164
+ },
165
+ {
166
+ "epoch": 10.0,
167
+ "grad_norm": Infinity,
168
+ "learning_rate": 6.364058915851744e-05,
169
+ "loss": 0.6621,
170
+ "step": 30120
171
+ },
172
+ {
173
+ "epoch": 10.0,
174
+ "eval_cer": 0.058245986284274416,
175
+ "eval_loss": 0.19237777590751648,
176
+ "eval_runtime": 155.0055,
177
+ "eval_samples_per_second": 234.508,
178
+ "eval_steps_per_second": 7.329,
179
+ "eval_wer": 0.3038691804061135,
180
+ "step": 30120
181
+ },
182
+ {
183
+ "epoch": 11.0,
184
+ "grad_norm": 5.588762283325195,
185
+ "learning_rate": 6.29337532026775e-05,
186
+ "loss": 0.6554,
187
+ "step": 33132
188
+ },
189
+ {
190
+ "epoch": 11.0,
191
+ "eval_cer": 0.05662298626501848,
192
+ "eval_loss": 0.18665704131126404,
193
+ "eval_runtime": 153.969,
194
+ "eval_samples_per_second": 236.087,
195
+ "eval_steps_per_second": 7.378,
196
+ "eval_wer": 0.2982627252006876,
197
+ "step": 33132
198
+ },
199
+ {
200
+ "epoch": 12.0,
201
+ "grad_norm": 5.955714702606201,
202
+ "learning_rate": 6.222691724683756e-05,
203
+ "loss": 0.6475,
204
+ "step": 36144
205
+ },
206
+ {
207
+ "epoch": 12.0,
208
+ "eval_cer": 0.05515495065438077,
209
+ "eval_loss": 0.1829417496919632,
210
+ "eval_runtime": 155.1287,
211
+ "eval_samples_per_second": 234.322,
212
+ "eval_steps_per_second": 7.323,
213
+ "eval_wer": 0.2873610923768119,
214
+ "step": 36144
215
+ },
216
+ {
217
+ "epoch": 13.0,
218
+ "grad_norm": 31.985713958740234,
219
+ "learning_rate": 6.152008129099762e-05,
220
+ "loss": 0.6429,
221
+ "step": 39156
222
+ },
223
+ {
224
+ "epoch": 13.0,
225
+ "eval_cer": 0.05419032013446143,
226
+ "eval_loss": 0.1801947057247162,
227
+ "eval_runtime": 168.8898,
228
+ "eval_samples_per_second": 215.229,
229
+ "eval_steps_per_second": 6.726,
230
+ "eval_wer": 0.28527553254407173,
231
+ "step": 39156
232
+ },
233
+ {
234
+ "epoch": 14.0,
235
+ "grad_norm": 6.557770252227783,
236
+ "learning_rate": 6.081324533515768e-05,
237
+ "loss": 0.6351,
238
+ "step": 42168
239
+ },
240
+ {
241
+ "epoch": 14.0,
242
+ "eval_cer": 0.055327337097104,
243
+ "eval_loss": 0.18261073529720306,
244
+ "eval_runtime": 159.2183,
245
+ "eval_samples_per_second": 228.303,
246
+ "eval_steps_per_second": 7.135,
247
+ "eval_wer": 0.2872746263804296,
248
+ "step": 42168
249
+ },
250
+ {
251
+ "epoch": 15.0,
252
+ "grad_norm": 8.820505142211914,
253
+ "learning_rate": 6.010640937931774e-05,
254
+ "loss": 0.6319,
255
+ "step": 45180
256
+ },
257
+ {
258
+ "epoch": 15.0,
259
+ "eval_cer": 0.05439250742499585,
260
+ "eval_loss": 0.17926117777824402,
261
+ "eval_runtime": 152.0308,
262
+ "eval_samples_per_second": 239.096,
263
+ "eval_steps_per_second": 7.472,
264
+ "eval_wer": 0.28315884495263394,
265
+ "step": 45180
266
+ },
267
+ {
268
+ "epoch": 16.0,
269
+ "grad_norm": 8.058792114257812,
270
+ "learning_rate": 5.93995734234778e-05,
271
+ "loss": 0.6251,
272
+ "step": 48192
273
+ },
274
+ {
275
+ "epoch": 16.0,
276
+ "eval_cer": 0.054798715904391546,
277
+ "eval_loss": 0.1785019189119339,
278
+ "eval_runtime": 154.575,
279
+ "eval_samples_per_second": 235.161,
280
+ "eval_steps_per_second": 7.349,
281
+ "eval_wer": 0.283826362444705,
282
+ "step": 48192
283
+ },
284
+ {
285
+ "epoch": 17.0,
286
+ "grad_norm": 7.038857936859131,
287
+ "learning_rate": 5.86925027164071e-05,
288
+ "loss": 0.6172,
289
+ "step": 51204
290
+ },
291
+ {
292
+ "epoch": 17.0,
293
+ "eval_cer": 0.051710431121988164,
294
+ "eval_loss": 0.17091116309165955,
295
+ "eval_runtime": 154.6152,
296
+ "eval_samples_per_second": 235.1,
297
+ "eval_steps_per_second": 7.347,
298
+ "eval_wer": 0.27192518270265037,
299
+ "step": 51204
300
+ },
301
+ {
302
+ "epoch": 18.0,
303
+ "grad_norm": NaN,
304
+ "learning_rate": 5.7985901511797926e-05,
305
+ "loss": 0.6122,
306
+ "step": 54216
307
+ },
308
+ {
309
+ "epoch": 18.0,
310
+ "eval_cer": 0.05208454638066411,
311
+ "eval_loss": 0.1720370054244995,
312
+ "eval_runtime": 154.3969,
313
+ "eval_samples_per_second": 235.432,
314
+ "eval_steps_per_second": 7.358,
315
+ "eval_wer": 0.27160698783596365,
316
+ "step": 54216
317
+ },
318
+ {
319
+ "epoch": 19.0,
320
+ "grad_norm": 4.123114109039307,
321
+ "learning_rate": 5.727930030718875e-05,
322
+ "loss": 0.6068,
323
+ "step": 57228
324
+ },
325
+ {
326
+ "epoch": 19.0,
327
+ "eval_cer": 0.0505266497520111,
328
+ "eval_loss": 0.16939722001552582,
329
+ "eval_runtime": 154.2835,
330
+ "eval_samples_per_second": 235.605,
331
+ "eval_steps_per_second": 7.363,
332
+ "eval_wer": 0.26646744901100194,
333
+ "step": 57228
334
+ },
335
+ {
336
+ "epoch": 20.0,
337
+ "grad_norm": 11.014168739318848,
338
+ "learning_rate": 5.657222960011804e-05,
339
+ "loss": 0.6035,
340
+ "step": 60240
341
+ },
342
+ {
343
+ "epoch": 20.0,
344
+ "eval_cer": 0.049698644657441546,
345
+ "eval_loss": 0.1669510304927826,
346
+ "eval_runtime": 155.9471,
347
+ "eval_samples_per_second": 233.092,
348
+ "eval_steps_per_second": 7.285,
349
+ "eval_wer": 0.26278053892526226,
350
+ "step": 60240
351
+ },
352
+ {
353
+ "epoch": 21.0,
354
+ "grad_norm": 7.567544937133789,
355
+ "learning_rate": 5.5865158893047335e-05,
356
+ "loss": 0.5957,
357
+ "step": 63252
358
+ },
359
+ {
360
+ "epoch": 21.0,
361
+ "eval_cer": 0.050415698903237105,
362
+ "eval_loss": 0.1704263538122177,
363
+ "eval_runtime": 155.0829,
364
+ "eval_samples_per_second": 234.391,
365
+ "eval_steps_per_second": 7.325,
366
+ "eval_wer": 0.2643818891782618,
367
+ "step": 63252
368
+ },
369
+ {
370
+ "epoch": 22.0,
371
+ "grad_norm": 3.353114366531372,
372
+ "learning_rate": 5.5158557688438164e-05,
373
+ "loss": 0.5909,
374
+ "step": 66264
375
+ },
376
+ {
377
+ "epoch": 22.0,
378
+ "eval_cer": 0.049318569229203364,
379
+ "eval_loss": 0.16528591513633728,
380
+ "eval_runtime": 155.149,
381
+ "eval_samples_per_second": 234.291,
382
+ "eval_steps_per_second": 7.322,
383
+ "eval_wer": 0.25990640920551583,
384
+ "step": 66264
385
+ },
386
+ {
387
+ "epoch": 23.0,
388
+ "grad_norm": 5.273142337799072,
389
+ "learning_rate": 5.445172173259822e-05,
390
+ "loss": 0.5879,
391
+ "step": 69276
392
+ },
393
+ {
394
+ "epoch": 23.0,
395
+ "eval_cer": 0.048735389561267335,
396
+ "eval_loss": 0.16745983064174652,
397
+ "eval_runtime": 155.8132,
398
+ "eval_samples_per_second": 233.292,
399
+ "eval_steps_per_second": 7.291,
400
+ "eval_wer": 0.2573400984328903,
401
+ "step": 69276
402
+ },
403
+ {
404
+ "epoch": 24.0,
405
+ "grad_norm": 12.098519325256348,
406
+ "learning_rate": 5.374512052798905e-05,
407
+ "loss": 0.5966,
408
+ "step": 72288
409
+ },
410
+ {
411
+ "epoch": 24.0,
412
+ "eval_cer": 0.05103463958854657,
413
+ "eval_loss": 0.19431033730506897,
414
+ "eval_runtime": 154.0761,
415
+ "eval_samples_per_second": 235.922,
416
+ "eval_steps_per_second": 7.373,
417
+ "eval_wer": 0.2738551037419025,
418
+ "step": 72288
419
+ },
420
+ {
421
+ "epoch": 25.0,
422
+ "grad_norm": 12.023294448852539,
423
+ "learning_rate": 5.3038519323379875e-05,
424
+ "loss": 0.6444,
425
+ "step": 75300
426
+ },
427
+ {
428
+ "epoch": 25.0,
429
+ "eval_cer": 0.05154996501838942,
430
+ "eval_loss": 0.1868334412574768,
431
+ "eval_runtime": 152.5369,
432
+ "eval_samples_per_second": 238.303,
433
+ "eval_steps_per_second": 7.447,
434
+ "eval_wer": 0.27229871580702175,
435
+ "step": 75300
436
+ },
437
+ {
438
+ "epoch": 26.0,
439
+ "grad_norm": 9.066435813903809,
440
+ "learning_rate": 5.2331448616309165e-05,
441
+ "loss": 0.5999,
442
+ "step": 78312
443
+ },
444
+ {
445
+ "epoch": 26.0,
446
+ "eval_cer": 0.04910904634536157,
447
+ "eval_loss": 0.16771361231803894,
448
+ "eval_runtime": 154.4851,
449
+ "eval_samples_per_second": 235.298,
450
+ "eval_steps_per_second": 7.353,
451
+ "eval_wer": 0.25782430801263095,
452
+ "step": 78312
453
+ },
454
+ {
455
+ "epoch": 27.0,
456
+ "grad_norm": 9.274683952331543,
457
+ "learning_rate": 5.1624847411699994e-05,
458
+ "loss": 0.5911,
459
+ "step": 81324
460
+ },
461
+ {
462
+ "epoch": 27.0,
463
+ "eval_cer": 0.04738747429103783,
464
+ "eval_loss": 0.16794191300868988,
465
+ "eval_runtime": 154.6471,
466
+ "eval_samples_per_second": 235.051,
467
+ "eval_steps_per_second": 7.346,
468
+ "eval_wer": 0.25102462205712983,
469
+ "step": 81324
470
+ },
471
+ {
472
+ "epoch": 28.0,
473
+ "grad_norm": 5.650504112243652,
474
+ "learning_rate": 5.091777670462929e-05,
475
+ "loss": 0.586,
476
+ "step": 84336
477
+ },
478
+ {
479
+ "epoch": 28.0,
480
+ "eval_cer": 0.04840666328618075,
481
+ "eval_loss": 0.1722731739282608,
482
+ "eval_runtime": 153.0438,
483
+ "eval_samples_per_second": 237.514,
484
+ "eval_steps_per_second": 7.423,
485
+ "eval_wer": 0.25386416537832335,
486
+ "step": 84336
487
+ },
488
+ {
489
+ "epoch": 29.0,
490
+ "grad_norm": 18.539613723754883,
491
+ "learning_rate": 5.021070599755859e-05,
492
+ "loss": 0.5816,
493
+ "step": 87348
494
+ },
495
+ {
496
+ "epoch": 29.0,
497
+ "eval_cer": 0.04769969548118283,
498
+ "eval_loss": 0.16775010526180267,
499
+ "eval_runtime": 156.0962,
500
+ "eval_samples_per_second": 232.869,
501
+ "eval_steps_per_second": 7.278,
502
+ "eval_wer": 0.25264326550940575,
503
+ "step": 87348
504
+ },
505
+ {
506
+ "epoch": 30.0,
507
+ "grad_norm": 4.343358039855957,
508
+ "learning_rate": 4.950457429541094e-05,
509
+ "loss": 0.5886,
510
+ "step": 90360
511
+ },
512
+ {
513
+ "epoch": 30.0,
514
+ "eval_cer": 0.04993246669411401,
515
+ "eval_loss": 0.18236766755580902,
516
+ "eval_runtime": 151.7294,
517
+ "eval_samples_per_second": 239.571,
518
+ "eval_steps_per_second": 7.487,
519
+ "eval_wer": 0.2629396363586056,
520
+ "step": 90360
521
+ },
522
+ {
523
+ "epoch": 31.0,
524
+ "grad_norm": 13.675621032714844,
525
+ "learning_rate": 4.879773833957101e-05,
526
+ "loss": 0.5978,
527
+ "step": 93372
528
+ },
529
+ {
530
+ "epoch": 31.0,
531
+ "eval_cer": 0.04701886072734242,
532
+ "eval_loss": 0.16201142966747284,
533
+ "eval_runtime": 152.4808,
534
+ "eval_samples_per_second": 238.391,
535
+ "eval_steps_per_second": 7.45,
536
+ "eval_wer": 0.24908778373816712,
537
+ "step": 93372
538
+ },
539
+ {
540
+ "epoch": 32.0,
541
+ "grad_norm": 5.842775821685791,
542
+ "learning_rate": 4.809066763250029e-05,
543
+ "loss": 0.5722,
544
+ "step": 96384
545
+ },
546
+ {
547
+ "epoch": 32.0,
548
+ "eval_cer": 0.04652920987407537,
549
+ "eval_loss": 0.15837915241718292,
550
+ "eval_runtime": 153.9577,
551
+ "eval_samples_per_second": 236.104,
552
+ "eval_steps_per_second": 7.379,
553
+ "eval_wer": 0.24719590773732322,
554
+ "step": 96384
555
+ },
556
+ {
557
+ "epoch": 33.0,
558
+ "grad_norm": 12.179231643676758,
559
+ "learning_rate": 4.738359692542959e-05,
560
+ "loss": 0.5615,
561
+ "step": 99396
562
+ },
563
+ {
564
+ "epoch": 33.0,
565
+ "eval_cer": 0.046122542920097966,
566
+ "eval_loss": 0.15639054775238037,
567
+ "eval_runtime": 165.4748,
568
+ "eval_samples_per_second": 219.671,
569
+ "eval_steps_per_second": 6.865,
570
+ "eval_wer": 0.2421047898703356,
571
+ "step": 99396
572
+ },
573
+ {
574
+ "epoch": 34.0,
575
+ "grad_norm": 11.811565399169922,
576
+ "learning_rate": 4.6676760969589654e-05,
577
+ "loss": 0.5566,
578
+ "step": 102408
579
+ },
580
+ {
581
+ "epoch": 34.0,
582
+ "eval_cer": 0.04475858103950859,
583
+ "eval_loss": 0.15303878486156464,
584
+ "eval_runtime": 151.973,
585
+ "eval_samples_per_second": 239.187,
586
+ "eval_steps_per_second": 7.475,
587
+ "eval_wer": 0.23680615361203053,
588
+ "step": 102408
589
+ },
590
+ {
591
+ "epoch": 35.0,
592
+ "grad_norm": 27.314680099487305,
593
+ "learning_rate": 4.5970394516211246e-05,
594
+ "loss": 0.5514,
595
+ "step": 105420
596
+ },
597
+ {
598
+ "epoch": 35.0,
599
+ "eval_cer": 0.04322590051284967,
600
+ "eval_loss": 0.14988180994987488,
601
+ "eval_runtime": 152.0916,
602
+ "eval_samples_per_second": 239.001,
603
+ "eval_steps_per_second": 7.469,
604
+ "eval_wer": 0.2308780449000626,
605
+ "step": 105420
606
+ },
607
+ {
608
+ "epoch": 36.0,
609
+ "grad_norm": 6.874896049499512,
610
+ "learning_rate": 4.526332380914054e-05,
611
+ "loss": 0.5485,
612
+ "step": 108432
613
+ },
614
+ {
615
+ "epoch": 36.0,
616
+ "eval_cer": 0.043550500516700855,
617
+ "eval_loss": 0.15108104050159454,
618
+ "eval_runtime": 156.7097,
619
+ "eval_samples_per_second": 231.958,
620
+ "eval_steps_per_second": 7.249,
621
+ "eval_wer": 0.23083308258194382,
622
+ "step": 108432
623
+ },
624
+ {
625
+ "epoch": 37.0,
626
+ "grad_norm": 18.988279342651367,
627
+ "learning_rate": 4.4556722604531365e-05,
628
+ "loss": 0.5451,
629
+ "step": 111444
630
+ },
631
+ {
632
+ "epoch": 37.0,
633
+ "eval_cer": 0.04385080136772137,
634
+ "eval_loss": 0.15071320533752441,
635
+ "eval_runtime": 172.1252,
636
+ "eval_samples_per_second": 211.183,
637
+ "eval_steps_per_second": 6.6,
638
+ "eval_wer": 0.23185338133925454,
639
+ "step": 111444
640
+ },
641
+ {
642
+ "epoch": 38.0,
643
+ "grad_norm": 28.172042846679688,
644
+ "learning_rate": 4.384988664869143e-05,
645
+ "loss": 0.5433,
646
+ "step": 114456
647
+ },
648
+ {
649
+ "epoch": 38.0,
650
+ "eval_cer": 0.0433771971248142,
651
+ "eval_loss": 0.14824804663658142,
652
+ "eval_runtime": 152.6042,
653
+ "eval_samples_per_second": 238.198,
654
+ "eval_steps_per_second": 7.444,
655
+ "eval_wer": 0.23122390888559166,
656
+ "step": 114456
657
+ },
658
+ {
659
+ "epoch": 39.0,
660
+ "grad_norm": 4.653916835784912,
661
+ "learning_rate": 4.3143050692851484e-05,
662
+ "loss": 0.5391,
663
+ "step": 117468
664
+ },
665
+ {
666
+ "epoch": 39.0,
667
+ "eval_cer": 0.04353903865215809,
668
+ "eval_loss": 0.14684619009494781,
669
+ "eval_runtime": 156.6046,
670
+ "eval_samples_per_second": 232.113,
671
+ "eval_steps_per_second": 7.254,
672
+ "eval_wer": 0.22910722129415387,
673
+ "step": 117468
674
+ },
675
+ {
676
+ "epoch": 40.0,
677
+ "grad_norm": 5.8400702476501465,
678
+ "learning_rate": 4.243621473701154e-05,
679
+ "loss": 0.5347,
680
+ "step": 120480
681
+ },
682
+ {
683
+ "epoch": 40.0,
684
+ "eval_cer": 0.042970530170836796,
685
+ "eval_loss": 0.1462726891040802,
686
+ "eval_runtime": 178.4468,
687
+ "eval_samples_per_second": 203.702,
688
+ "eval_steps_per_second": 6.366,
689
+ "eval_wer": 0.22744707416361443,
690
+ "step": 120480
691
+ },
692
+ {
693
+ "epoch": 41.0,
694
+ "grad_norm": 16.060422897338867,
695
+ "learning_rate": 4.172914402994084e-05,
696
+ "loss": 0.5313,
697
+ "step": 123492
698
+ },
699
+ {
700
+ "epoch": 41.0,
701
+ "eval_cer": 0.04216269795786252,
702
+ "eval_loss": 0.14503081142902374,
703
+ "eval_runtime": 152.5692,
704
+ "eval_samples_per_second": 238.253,
705
+ "eval_steps_per_second": 7.446,
706
+ "eval_wer": 0.22400226886774507,
707
+ "step": 123492
708
+ },
709
+ {
710
+ "epoch": 42.0,
711
+ "grad_norm": 8.755998611450195,
712
+ "learning_rate": 4.102254282533167e-05,
713
+ "loss": 0.5291,
714
+ "step": 126504
715
+ },
716
+ {
717
+ "epoch": 42.0,
718
+ "eval_cer": 0.04194263015864137,
719
+ "eval_loss": 0.1446152627468109,
720
+ "eval_runtime": 152.4116,
721
+ "eval_samples_per_second": 238.499,
722
+ "eval_steps_per_second": 7.454,
723
+ "eval_wer": 0.224061065745285,
724
+ "step": 126504
725
+ },
726
+ {
727
+ "epoch": 43.0,
728
+ "grad_norm": 9.50841999053955,
729
+ "learning_rate": 4.031570686949173e-05,
730
+ "loss": 0.5269,
731
+ "step": 129516
732
+ },
733
+ {
734
+ "epoch": 43.0,
735
+ "eval_cer": 0.04270048864220919,
736
+ "eval_loss": 0.14530107378959656,
737
+ "eval_runtime": 153.731,
738
+ "eval_samples_per_second": 236.452,
739
+ "eval_steps_per_second": 7.39,
740
+ "eval_wer": 0.22547219080624353,
741
+ "step": 129516
742
+ },
743
+ {
744
+ "epoch": 44.0,
745
+ "grad_norm": 28.027828216552734,
746
+ "learning_rate": 3.960887091365178e-05,
747
+ "loss": 0.5253,
748
+ "step": 132528
749
+ },
750
+ {
751
+ "epoch": 44.0,
752
+ "eval_cer": 0.042549192030244654,
753
+ "eval_loss": 0.14459766447544098,
754
+ "eval_runtime": 179.3972,
755
+ "eval_samples_per_second": 202.623,
756
+ "eval_steps_per_second": 6.332,
757
+ "eval_wer": 0.22531309337290018,
758
+ "step": 132528
759
+ },
760
+ {
761
+ "epoch": 45.0,
762
+ "grad_norm": 9.71485710144043,
763
+ "learning_rate": 3.890226970904261e-05,
764
+ "loss": 0.523,
765
+ "step": 135540
766
+ },
767
+ {
768
+ "epoch": 45.0,
769
+ "eval_cer": 0.041189356420890666,
770
+ "eval_loss": 0.1429988592863083,
771
+ "eval_runtime": 151.6718,
772
+ "eval_samples_per_second": 239.662,
773
+ "eval_steps_per_second": 7.49,
774
+ "eval_wer": 0.22018738910735963,
775
+ "step": 135540
776
+ },
777
+ {
778
+ "epoch": 46.0,
779
+ "grad_norm": 13.113300323486328,
780
+ "learning_rate": 3.819519900197191e-05,
781
+ "loss": 0.5192,
782
+ "step": 138552
783
+ },
784
+ {
785
+ "epoch": 46.0,
786
+ "eval_cer": 0.040866590315366325,
787
+ "eval_loss": 0.14137160778045654,
788
+ "eval_runtime": 152.9576,
789
+ "eval_samples_per_second": 237.648,
790
+ "eval_steps_per_second": 7.427,
791
+ "eval_wer": 0.21718528971296747,
792
+ "step": 138552
793
+ },
794
+ {
795
+ "epoch": 47.0,
796
+ "grad_norm": 15.17225456237793,
797
+ "learning_rate": 3.7488597797362736e-05,
798
+ "loss": 0.518,
799
+ "step": 141564
800
+ },
801
+ {
802
+ "epoch": 47.0,
803
+ "eval_cer": 0.040502561497488015,
804
+ "eval_loss": 0.14037571847438812,
805
+ "eval_runtime": 151.9659,
806
+ "eval_samples_per_second": 239.198,
807
+ "eval_steps_per_second": 7.475,
808
+ "eval_wer": 0.21598168304332638,
809
+ "step": 141564
810
+ },
811
+ {
812
+ "epoch": 48.0,
813
+ "grad_norm": 5.159524917602539,
814
+ "learning_rate": 3.678152709029203e-05,
815
+ "loss": 0.5139,
816
+ "step": 144576
817
+ },
818
+ {
819
+ "epoch": 48.0,
820
+ "eval_cer": 0.04006930301777139,
821
+ "eval_loss": 0.13999390602111816,
822
+ "eval_runtime": 161.9319,
823
+ "eval_samples_per_second": 224.477,
824
+ "eval_steps_per_second": 7.015,
825
+ "eval_wer": 0.2143319118323528,
826
+ "step": 144576
827
+ },
828
+ {
829
+ "epoch": 49.0,
830
+ "grad_norm": 5.24137020111084,
831
+ "learning_rate": 3.6074925885682855e-05,
832
+ "loss": 0.5133,
833
+ "step": 147588
834
+ },
835
+ {
836
+ "epoch": 49.0,
837
+ "eval_cer": 0.04118523014965527,
838
+ "eval_loss": 0.14138683676719666,
839
+ "eval_runtime": 154.3411,
840
+ "eval_samples_per_second": 235.517,
841
+ "eval_steps_per_second": 7.36,
842
+ "eval_wer": 0.21796694232026315,
843
+ "step": 147588
844
+ },
845
+ {
846
+ "epoch": 50.0,
847
+ "grad_norm": 4.531148910522461,
848
+ "learning_rate": 3.5367855178612144e-05,
849
+ "loss": 0.5114,
850
+ "step": 150600
851
+ },
852
+ {
853
+ "epoch": 50.0,
854
+ "eval_cer": 0.04037235471628217,
855
+ "eval_loss": 0.14019279181957245,
856
+ "eval_runtime": 152.1041,
857
+ "eval_samples_per_second": 238.981,
858
+ "eval_steps_per_second": 7.469,
859
+ "eval_wer": 0.21485070781064639,
860
+ "step": 150600
861
+ },
862
+ {
863
+ "epoch": 51.0,
864
+ "grad_norm": 6.761490821838379,
865
+ "learning_rate": 3.4661019222772204e-05,
866
+ "loss": 0.5087,
867
+ "step": 153612
868
+ },
869
+ {
870
+ "epoch": 51.0,
871
+ "eval_cer": 0.04063826997367439,
872
+ "eval_loss": 0.14041763544082642,
873
+ "eval_runtime": 154.2132,
874
+ "eval_samples_per_second": 235.713,
875
+ "eval_steps_per_second": 7.366,
876
+ "eval_wer": 0.21654889997959403,
877
+ "step": 153612
878
+ },
879
+ {
880
+ "epoch": 52.0,
881
+ "grad_norm": 4.379857540130615,
882
+ "learning_rate": 3.395441801816303e-05,
883
+ "loss": 0.5066,
884
+ "step": 156624
885
+ },
886
+ {
887
+ "epoch": 52.0,
888
+ "eval_cer": 0.04042691319150575,
889
+ "eval_loss": 0.13891662657260895,
890
+ "eval_runtime": 159.6511,
891
+ "eval_samples_per_second": 227.684,
892
+ "eval_steps_per_second": 7.116,
893
+ "eval_wer": 0.215715367774469,
894
+ "step": 156624
895
+ },
896
+ {
897
+ "epoch": 53.0,
898
+ "grad_norm": 19.332077026367188,
899
+ "learning_rate": 3.324734731109233e-05,
900
+ "loss": 0.5037,
901
+ "step": 159636
902
+ },
903
+ {
904
+ "epoch": 53.0,
905
+ "eval_cer": 0.03977129453965943,
906
+ "eval_loss": 0.1375364065170288,
907
+ "eval_runtime": 164.6123,
908
+ "eval_samples_per_second": 220.822,
909
+ "eval_steps_per_second": 6.901,
910
+ "eval_wer": 0.2132078538793834,
911
+ "step": 159636
912
+ },
913
+ {
914
+ "epoch": 54.0,
915
+ "grad_norm": 11.953184127807617,
916
+ "learning_rate": 3.254098085771392e-05,
917
+ "loss": 0.5024,
918
+ "step": 162648
919
+ },
920
+ {
921
+ "epoch": 54.0,
922
+ "eval_cer": 0.039842816574406296,
923
+ "eval_loss": 0.13721118867397308,
924
+ "eval_runtime": 156.3278,
925
+ "eval_samples_per_second": 232.524,
926
+ "eval_steps_per_second": 7.267,
927
+ "eval_wer": 0.21213221688438805,
928
+ "step": 162648
929
+ },
930
+ {
931
+ "epoch": 55.0,
932
+ "grad_norm": 7.574815273284912,
933
+ "learning_rate": 3.183391015064322e-05,
934
+ "loss": 0.5,
935
+ "step": 165660
936
+ },
937
+ {
938
+ "epoch": 55.0,
939
+ "eval_cer": 0.04010139623849114,
940
+ "eval_loss": 0.13785392045974731,
941
+ "eval_runtime": 152.1653,
942
+ "eval_samples_per_second": 238.885,
943
+ "eval_steps_per_second": 7.466,
944
+ "eval_wer": 0.2131732674808305,
945
+ "step": 165660
946
+ },
947
+ {
948
+ "epoch": 56.0,
949
+ "grad_norm": 11.841280937194824,
950
+ "learning_rate": 3.112707419480328e-05,
951
+ "loss": 0.4976,
952
+ "step": 168672
953
+ },
954
+ {
955
+ "epoch": 56.0,
956
+ "eval_cer": 0.038647114865304755,
957
+ "eval_loss": 0.13485907018184662,
958
+ "eval_runtime": 153.6401,
959
+ "eval_samples_per_second": 236.592,
960
+ "eval_steps_per_second": 7.394,
961
+ "eval_wer": 0.20721403101016495,
962
+ "step": 168672
963
+ },
964
+ {
965
+ "epoch": 57.0,
966
+ "grad_norm": 14.78765869140625,
967
+ "learning_rate": 3.0420238238963334e-05,
968
+ "loss": 0.4948,
969
+ "step": 171684
970
+ },
971
+ {
972
+ "epoch": 57.0,
973
+ "eval_cer": 0.0392779758897387,
974
+ "eval_loss": 0.13624149560928345,
975
+ "eval_runtime": 152.0309,
976
+ "eval_samples_per_second": 239.096,
977
+ "eval_steps_per_second": 7.472,
978
+ "eval_wer": 0.2102472581632547,
979
+ "step": 171684
980
+ },
981
+ {
982
+ "epoch": 58.0,
983
+ "grad_norm": 7.023338794708252,
984
+ "learning_rate": 2.9713402283123397e-05,
985
+ "loss": 0.4933,
986
+ "step": 174696
987
+ },
988
+ {
989
+ "epoch": 58.0,
990
+ "eval_cer": 0.038942372495926456,
991
+ "eval_loss": 0.13551433384418488,
992
+ "eval_runtime": 151.9877,
993
+ "eval_samples_per_second": 239.164,
994
+ "eval_steps_per_second": 7.474,
995
+ "eval_wer": 0.20676094918912188,
996
+ "step": 174696
997
+ },
998
+ {
999
+ "epoch": 59.0,
1000
+ "grad_norm": 6.568221092224121,
1001
+ "learning_rate": 2.9006801078514222e-05,
1002
+ "loss": 0.4924,
1003
+ "step": 177708
1004
+ },
1005
+ {
1006
+ "epoch": 59.0,
1007
+ "eval_cer": 0.03848756571086942,
1008
+ "eval_loss": 0.13611619174480438,
1009
+ "eval_runtime": 177.5315,
1010
+ "eval_samples_per_second": 204.752,
1011
+ "eval_steps_per_second": 6.399,
1012
+ "eval_wer": 0.20549508700208555,
1013
+ "step": 177708
1014
+ },
1015
+ {
1016
+ "epoch": 60.0,
1017
+ "grad_norm": 23.931304931640625,
1018
+ "learning_rate": 2.8300199873905052e-05,
1019
+ "loss": 0.4901,
1020
+ "step": 180720
1021
+ },
1022
+ {
1023
+ "epoch": 60.0,
1024
+ "eval_cer": 0.03840274791325294,
1025
+ "eval_loss": 0.13464532792568207,
1026
+ "eval_runtime": 153.1916,
1027
+ "eval_samples_per_second": 237.285,
1028
+ "eval_steps_per_second": 7.416,
1029
+ "eval_wer": 0.2053671173274398,
1030
+ "step": 180720
1031
+ },
1032
+ {
1033
+ "epoch": 61.0,
1034
+ "grad_norm": 18.084495544433594,
1035
+ "learning_rate": 2.759312916683434e-05,
1036
+ "loss": 0.4898,
1037
+ "step": 183732
1038
+ },
1039
+ {
1040
+ "epoch": 61.0,
1041
+ "eval_cer": 0.038370654692533195,
1042
+ "eval_loss": 0.13341517746448517,
1043
+ "eval_runtime": 151.6501,
1044
+ "eval_samples_per_second": 239.696,
1045
+ "eval_steps_per_second": 7.491,
1046
+ "eval_wer": 0.2050074187824896,
1047
+ "step": 183732
1048
+ },
1049
+ {
1050
+ "epoch": 62.0,
1051
+ "grad_norm": 2.890596866607666,
1052
+ "learning_rate": 2.6886293210994404e-05,
1053
+ "loss": 0.4873,
1054
+ "step": 186744
1055
+ },
1056
+ {
1057
+ "epoch": 62.0,
1058
+ "eval_cer": 0.038351857234683054,
1059
+ "eval_loss": 0.1341981142759323,
1060
+ "eval_runtime": 150.7747,
1061
+ "eval_samples_per_second": 241.088,
1062
+ "eval_steps_per_second": 7.534,
1063
+ "eval_wer": 0.20600696570066857,
1064
+ "step": 186744
1065
+ },
1066
+ {
1067
+ "epoch": 63.0,
1068
+ "grad_norm": 11.759881973266602,
1069
+ "learning_rate": 2.617969200638523e-05,
1070
+ "loss": 0.4865,
1071
+ "step": 189756
1072
+ },
1073
+ {
1074
+ "epoch": 63.0,
1075
+ "eval_cer": 0.03869296232347583,
1076
+ "eval_loss": 0.13458400964736938,
1077
+ "eval_runtime": 152.6729,
1078
+ "eval_samples_per_second": 238.091,
1079
+ "eval_steps_per_second": 7.441,
1080
+ "eval_wer": 0.20699613669928163,
1081
+ "step": 189756
1082
+ },
1083
+ {
1084
+ "epoch": 64.0,
1085
+ "grad_norm": 13.245360374450684,
1086
+ "learning_rate": 2.547309080177606e-05,
1087
+ "loss": 0.4842,
1088
+ "step": 192768
1089
+ },
1090
+ {
1091
+ "epoch": 64.0,
1092
+ "eval_cer": 0.03874110215455545,
1093
+ "eval_loss": 0.13456492125988007,
1094
+ "eval_runtime": 153.3987,
1095
+ "eval_samples_per_second": 236.964,
1096
+ "eval_steps_per_second": 7.406,
1097
+ "eval_wer": 0.2072278655695861,
1098
+ "step": 192768
1099
+ },
1100
+ {
1101
+ "epoch": 65.0,
1102
+ "grad_norm": 11.684355735778809,
1103
+ "learning_rate": 2.4766020094705352e-05,
1104
+ "loss": 0.4822,
1105
+ "step": 195780
1106
+ },
1107
+ {
1108
+ "epoch": 65.0,
1109
+ "eval_cer": 0.03811941062175572,
1110
+ "eval_loss": 0.13252592086791992,
1111
+ "eval_runtime": 156.7414,
1112
+ "eval_samples_per_second": 231.911,
1113
+ "eval_steps_per_second": 7.248,
1114
+ "eval_wer": 0.20395599226648128,
1115
+ "step": 195780
1116
+ },
1117
+ {
1118
+ "epoch": 66.0,
1119
+ "grad_norm": 25.19974708557129,
1120
+ "learning_rate": 2.405918413886541e-05,
1121
+ "loss": 0.4814,
1122
+ "step": 198792
1123
+ },
1124
+ {
1125
+ "epoch": 66.0,
1126
+ "eval_cer": 0.037090135185815165,
1127
+ "eval_loss": 0.13119570910930634,
1128
+ "eval_runtime": 154.348,
1129
+ "eval_samples_per_second": 235.507,
1130
+ "eval_steps_per_second": 7.36,
1131
+ "eval_wer": 0.19890983671761242,
1132
+ "step": 198792
1133
+ },
1134
+ {
1135
+ "epoch": 67.0,
1136
+ "grad_norm": 12.580814361572266,
1137
+ "learning_rate": 2.335234818302547e-05,
1138
+ "loss": 0.4796,
1139
+ "step": 201804
1140
+ },
1141
+ {
1142
+ "epoch": 67.0,
1143
+ "eval_cer": 0.03740739959635898,
1144
+ "eval_loss": 0.13117973506450653,
1145
+ "eval_runtime": 162.9804,
1146
+ "eval_samples_per_second": 223.033,
1147
+ "eval_steps_per_second": 6.97,
1148
+ "eval_wer": 0.1999750977930419,
1149
+ "step": 201804
1150
+ },
1151
+ {
1152
+ "epoch": 68.0,
1153
+ "grad_norm": 11.110360145568848,
1154
+ "learning_rate": 2.2645746978416297e-05,
1155
+ "loss": 0.4771,
1156
+ "step": 204816
1157
+ },
1158
+ {
1159
+ "epoch": 68.0,
1160
+ "eval_cer": 0.037213006373713636,
1161
+ "eval_loss": 0.1303921490907669,
1162
+ "eval_runtime": 152.708,
1163
+ "eval_samples_per_second": 238.036,
1164
+ "eval_steps_per_second": 7.439,
1165
+ "eval_wer": 0.1997191584437504,
1166
+ "step": 204816
1167
+ },
1168
+ {
1169
+ "epoch": 69.0,
1170
+ "grad_norm": 4.782271385192871,
1171
+ "learning_rate": 2.193891102257636e-05,
1172
+ "loss": 0.4756,
1173
+ "step": 207828
1174
+ },
1175
+ {
1176
+ "epoch": 69.0,
1177
+ "eval_cer": 0.037708617396542916,
1178
+ "eval_loss": 0.13083402812480927,
1179
+ "eval_runtime": 152.8061,
1180
+ "eval_samples_per_second": 237.883,
1181
+ "eval_steps_per_second": 7.434,
1182
+ "eval_wer": 0.20086396823585156,
1183
+ "step": 207828
1184
+ },
1185
+ {
1186
+ "epoch": 70.0,
1187
+ "grad_norm": 7.983453750610352,
1188
+ "learning_rate": 2.1232544569197956e-05,
1189
+ "loss": 0.4745,
1190
+ "step": 210840
1191
+ },
1192
+ {
1193
+ "epoch": 70.0,
1194
+ "eval_cer": 0.0370488724734612,
1195
+ "eval_loss": 0.13116249442100525,
1196
+ "eval_runtime": 151.9447,
1197
+ "eval_samples_per_second": 239.232,
1198
+ "eval_steps_per_second": 7.476,
1199
+ "eval_wer": 0.19823886058568607,
1200
+ "step": 210840
1201
+ },
1202
+ {
1203
+ "epoch": 71.0,
1204
+ "grad_norm": 23.63794708251953,
1205
+ "learning_rate": 2.052547386212725e-05,
1206
+ "loss": 0.4738,
1207
+ "step": 213852
1208
+ },
1209
+ {
1210
+ "epoch": 71.0,
1211
+ "eval_cer": 0.037366136884005016,
1212
+ "eval_loss": 0.1306936889886856,
1213
+ "eval_runtime": 154.0224,
1214
+ "eval_samples_per_second": 236.005,
1215
+ "eval_steps_per_second": 7.376,
1216
+ "eval_wer": 0.20006848106913475,
1217
+ "step": 213852
1218
+ },
1219
+ {
1220
+ "epoch": 72.0,
1221
+ "grad_norm": 10.838956832885742,
1222
+ "learning_rate": 1.9818637906287305e-05,
1223
+ "loss": 0.473,
1224
+ "step": 216864
1225
+ },
1226
+ {
1227
+ "epoch": 72.0,
1228
+ "eval_cer": 0.0372285945094918,
1229
+ "eval_loss": 0.13071005046367645,
1230
+ "eval_runtime": 154.8642,
1231
+ "eval_samples_per_second": 234.722,
1232
+ "eval_steps_per_second": 7.335,
1233
+ "eval_wer": 0.19911043782921928,
1234
+ "step": 216864
1235
+ },
1236
+ {
1237
+ "epoch": 73.0,
1238
+ "grad_norm": 11.969744682312012,
1239
+ "learning_rate": 1.9111801950447367e-05,
1240
+ "loss": 0.472,
1241
+ "step": 219876
1242
+ },
1243
+ {
1244
+ "epoch": 73.0,
1245
+ "eval_cer": 0.03662890975661418,
1246
+ "eval_loss": 0.12924158573150635,
1247
+ "eval_runtime": 154.3055,
1248
+ "eval_samples_per_second": 235.572,
1249
+ "eval_steps_per_second": 7.362,
1250
+ "eval_wer": 0.19607375203627422,
1251
+ "step": 219876
1252
+ },
1253
+ {
1254
+ "epoch": 74.0,
1255
+ "grad_norm": 6.115599155426025,
1256
+ "learning_rate": 1.840496599460743e-05,
1257
+ "loss": 0.4693,
1258
+ "step": 222888
1259
+ },
1260
+ {
1261
+ "epoch": 74.0,
1262
+ "eval_cer": 0.036412051279465014,
1263
+ "eval_loss": 0.12866230309009552,
1264
+ "eval_runtime": 157.4725,
1265
+ "eval_samples_per_second": 230.834,
1266
+ "eval_steps_per_second": 7.214,
1267
+ "eval_wer": 0.19521600935216216,
1268
+ "step": 222888
1269
+ },
1270
+ {
1271
+ "epoch": 75.0,
1272
+ "grad_norm": 44.51272964477539,
1273
+ "learning_rate": 1.7698130038767486e-05,
1274
+ "loss": 0.4693,
1275
+ "step": 225900
1276
+ },
1277
+ {
1278
+ "epoch": 75.0,
1279
+ "eval_cer": 0.03628138602367746,
1280
+ "eval_loss": 0.12844808399677277,
1281
+ "eval_runtime": 164.6927,
1282
+ "eval_samples_per_second": 220.714,
1283
+ "eval_steps_per_second": 6.898,
1284
+ "eval_wer": 0.1944724017832747,
1285
+ "step": 225900
1286
+ },
1287
+ {
1288
+ "epoch": 76.0,
1289
+ "grad_norm": 9.162590026855469,
1290
+ "learning_rate": 1.6991528834158312e-05,
1291
+ "loss": 0.4664,
1292
+ "step": 228912
1293
+ },
1294
+ {
1295
+ "epoch": 76.0,
1296
+ "eval_cer": 0.03683797416587427,
1297
+ "eval_loss": 0.12876588106155396,
1298
+ "eval_runtime": 152.2551,
1299
+ "eval_samples_per_second": 238.744,
1300
+ "eval_steps_per_second": 7.461,
1301
+ "eval_wer": 0.19688999104212276,
1302
+ "step": 228912
1303
+ },
1304
+ {
1305
+ "epoch": 77.0,
1306
+ "grad_norm": 3.896597146987915,
1307
+ "learning_rate": 1.6284692878318375e-05,
1308
+ "loss": 0.4651,
1309
+ "step": 231924
1310
+ },
1311
+ {
1312
+ "epoch": 77.0,
1313
+ "eval_cer": 0.03683384789463887,
1314
+ "eval_loss": 0.12869854271411896,
1315
+ "eval_runtime": 168.0203,
1316
+ "eval_samples_per_second": 216.343,
1317
+ "eval_steps_per_second": 6.761,
1318
+ "eval_wer": 0.1971009680732955,
1319
+ "step": 231924
1320
+ },
1321
+ {
1322
+ "epoch": 78.0,
1323
+ "grad_norm": 15.388148307800293,
1324
+ "learning_rate": 1.5577856922478434e-05,
1325
+ "loss": 0.4641,
1326
+ "step": 234936
1327
+ },
1328
+ {
1329
+ "epoch": 78.0,
1330
+ "eval_cer": 0.03656747416266495,
1331
+ "eval_loss": 0.1286703646183014,
1332
+ "eval_runtime": 154.3849,
1333
+ "eval_samples_per_second": 235.45,
1334
+ "eval_steps_per_second": 7.358,
1335
+ "eval_wer": 0.1952090920724516,
1336
+ "step": 234936
1337
+ },
1338
+ {
1339
+ "epoch": 79.0,
1340
+ "grad_norm": 2.635507345199585,
1341
+ "learning_rate": 1.4871020966638496e-05,
1342
+ "loss": 0.462,
1343
+ "step": 237948
1344
+ },
1345
+ {
1346
+ "epoch": 79.0,
1347
+ "eval_cer": 0.03641342670321015,
1348
+ "eval_loss": 0.12868022918701172,
1349
+ "eval_runtime": 152.2309,
1350
+ "eval_samples_per_second": 238.782,
1351
+ "eval_steps_per_second": 7.462,
1352
+ "eval_wer": 0.19447586042313,
1353
+ "step": 237948
1354
+ },
1355
+ {
1356
+ "epoch": 80.0,
1357
+ "grad_norm": 5.287237167358398,
1358
+ "learning_rate": 1.4163950259567787e-05,
1359
+ "loss": 0.4608,
1360
+ "step": 240960
1361
+ },
1362
+ {
1363
+ "epoch": 80.0,
1364
+ "eval_cer": 0.036332276702247354,
1365
+ "eval_loss": 0.12745150923728943,
1366
+ "eval_runtime": 154.8487,
1367
+ "eval_samples_per_second": 234.745,
1368
+ "eval_steps_per_second": 7.336,
1369
+ "eval_wer": 0.19517796431375398,
1370
+ "step": 240960
1371
+ },
1372
+ {
1373
+ "epoch": 81.0,
1374
+ "grad_norm": 11.394750595092773,
1375
+ "learning_rate": 1.3457349054958616e-05,
1376
+ "loss": 0.4594,
1377
+ "step": 243972
1378
+ },
1379
+ {
1380
+ "epoch": 81.0,
1381
+ "eval_cer": 0.0360989131401566,
1382
+ "eval_loss": 0.12770119309425354,
1383
+ "eval_runtime": 152.6766,
1384
+ "eval_samples_per_second": 238.085,
1385
+ "eval_steps_per_second": 7.441,
1386
+ "eval_wer": 0.19389480892744118,
1387
+ "step": 243972
1388
+ },
1389
+ {
1390
+ "epoch": 82.0,
1391
+ "grad_norm": 9.252601623535156,
1392
+ "learning_rate": 1.2750278347887909e-05,
1393
+ "loss": 0.4595,
1394
+ "step": 246984
1395
+ },
1396
+ {
1397
+ "epoch": 82.0,
1398
+ "eval_cer": 0.03594715805361035,
1399
+ "eval_loss": 0.12681059539318085,
1400
+ "eval_runtime": 152.0339,
1401
+ "eval_samples_per_second": 239.091,
1402
+ "eval_steps_per_second": 7.472,
1403
+ "eval_wer": 0.19371841829482137,
1404
+ "step": 246984
1405
+ },
1406
+ {
1407
+ "epoch": 83.0,
1408
+ "grad_norm": 16.776588439941406,
1409
+ "learning_rate": 1.2043677143278735e-05,
1410
+ "loss": 0.4575,
1411
+ "step": 249996
1412
+ },
1413
+ {
1414
+ "epoch": 83.0,
1415
+ "eval_cer": 0.03615393008996188,
1416
+ "eval_loss": 0.12722131609916687,
1417
+ "eval_runtime": 152.0731,
1418
+ "eval_samples_per_second": 239.03,
1419
+ "eval_steps_per_second": 7.47,
1420
+ "eval_wer": 0.1942475901926808,
1421
+ "step": 249996
1422
+ },
1423
+ {
1424
+ "epoch": 84.0,
1425
+ "grad_norm": 8.937053680419922,
1426
+ "learning_rate": 1.1336606436208028e-05,
1427
+ "loss": 0.4569,
1428
+ "step": 253008
1429
+ },
1430
+ {
1431
+ "epoch": 84.0,
1432
+ "eval_cer": 0.03605260720740382,
1433
+ "eval_loss": 0.12680456042289734,
1434
+ "eval_runtime": 152.9563,
1435
+ "eval_samples_per_second": 237.65,
1436
+ "eval_steps_per_second": 7.427,
1437
+ "eval_wer": 0.19341059934770052,
1438
+ "step": 253008
1439
+ },
1440
+ {
1441
+ "epoch": 85.0,
1442
+ "grad_norm": 13.364398956298828,
1443
+ "learning_rate": 1.0630005231598857e-05,
1444
+ "loss": 0.4552,
1445
+ "step": 256020
1446
+ },
1447
+ {
1448
+ "epoch": 85.0,
1449
+ "eval_cer": 0.035670697880838785,
1450
+ "eval_loss": 0.12619073688983917,
1451
+ "eval_runtime": 156.3815,
1452
+ "eval_samples_per_second": 232.444,
1453
+ "eval_steps_per_second": 7.264,
1454
+ "eval_wer": 0.1916293998222259,
1455
+ "step": 256020
1456
+ },
1457
+ {
1458
+ "epoch": 86.0,
1459
+ "grad_norm": 9.33234691619873,
1460
+ "learning_rate": 9.923404026989685e-06,
1461
+ "loss": 0.4538,
1462
+ "step": 259032
1463
+ },
1464
+ {
1465
+ "epoch": 86.0,
1466
+ "eval_cer": 0.03549051737022648,
1467
+ "eval_loss": 0.12592804431915283,
1468
+ "eval_runtime": 156.5096,
1469
+ "eval_samples_per_second": 232.254,
1470
+ "eval_steps_per_second": 7.258,
1471
+ "eval_wer": 0.19070940162071864,
1472
+ "step": 259032
1473
+ },
1474
+ {
1475
+ "epoch": 87.0,
1476
+ "grad_norm": 4.286988735198975,
1477
+ "learning_rate": 9.216568071149744e-06,
1478
+ "loss": 0.4532,
1479
+ "step": 262044
1480
+ },
1481
+ {
1482
+ "epoch": 87.0,
1483
+ "eval_cer": 0.03551573347222057,
1484
+ "eval_loss": 0.12575581669807434,
1485
+ "eval_runtime": 155.6329,
1486
+ "eval_samples_per_second": 233.562,
1487
+ "eval_steps_per_second": 7.299,
1488
+ "eval_wer": 0.19122473895915693,
1489
+ "step": 262044
1490
+ },
1491
+ {
1492
+ "epoch": 88.0,
1493
+ "grad_norm": 7.920403957366943,
1494
+ "learning_rate": 8.509732115309804e-06,
1495
+ "loss": 0.4524,
1496
+ "step": 265056
1497
+ },
1498
+ {
1499
+ "epoch": 88.0,
1500
+ "eval_cer": 0.03555103601501229,
1501
+ "eval_loss": 0.1259673833847046,
1502
+ "eval_runtime": 166.3365,
1503
+ "eval_samples_per_second": 218.533,
1504
+ "eval_steps_per_second": 6.83,
1505
+ "eval_wer": 0.19095150641058897,
1506
+ "step": 265056
1507
+ },
1508
+ {
1509
+ "epoch": 89.0,
1510
+ "grad_norm": 9.81010913848877,
1511
+ "learning_rate": 7.802896159469865e-06,
1512
+ "loss": 0.4501,
1513
+ "step": 268068
1514
+ },
1515
+ {
1516
+ "epoch": 89.0,
1517
+ "eval_cer": 0.03596458008771536,
1518
+ "eval_loss": 0.12655647099018097,
1519
+ "eval_runtime": 155.3246,
1520
+ "eval_samples_per_second": 234.026,
1521
+ "eval_steps_per_second": 7.314,
1522
+ "eval_wer": 0.19276729233461648,
1523
+ "step": 268068
1524
+ },
1525
+ {
1526
+ "epoch": 90.0,
1527
+ "grad_norm": 10.394911766052246,
1528
+ "learning_rate": 7.096060203629924e-06,
1529
+ "loss": 0.4491,
1530
+ "step": 271080
1531
+ },
1532
+ {
1533
+ "epoch": 90.0,
1534
+ "eval_cer": 0.03546667669197752,
1535
+ "eval_loss": 0.12519720196723938,
1536
+ "eval_runtime": 177.0207,
1537
+ "eval_samples_per_second": 205.343,
1538
+ "eval_steps_per_second": 6.417,
1539
+ "eval_wer": 0.19042579315258482,
1540
+ "step": 271080
1541
+ },
1542
+ {
1543
+ "epoch": 91.0,
1544
+ "grad_norm": 9.381885528564453,
1545
+ "learning_rate": 6.3892242477899846e-06,
1546
+ "loss": 0.4486,
1547
+ "step": 274092
1548
+ },
1549
+ {
1550
+ "epoch": 91.0,
1551
+ "eval_cer": 0.03518288092589859,
1552
+ "eval_loss": 0.12525735795497894,
1553
+ "eval_runtime": 173.742,
1554
+ "eval_samples_per_second": 209.218,
1555
+ "eval_steps_per_second": 6.538,
1556
+ "eval_wer": 0.18893166073509932,
1557
+ "step": 274092
1558
+ },
1559
+ {
1560
+ "epoch": 92.0,
1561
+ "grad_norm": 26.816091537475586,
1562
+ "learning_rate": 5.682623043180811e-06,
1563
+ "loss": 0.4487,
1564
+ "step": 277104
1565
+ },
1566
+ {
1567
+ "epoch": 92.0,
1568
+ "eval_cer": 0.035369480080654846,
1569
+ "eval_loss": 0.12525933980941772,
1570
+ "eval_runtime": 153.4339,
1571
+ "eval_samples_per_second": 236.91,
1572
+ "eval_steps_per_second": 7.404,
1573
+ "eval_wer": 0.190249402519965,
1574
+ "step": 277104
1575
+ },
1576
+ {
1577
+ "epoch": 93.0,
1578
+ "grad_norm": 12.131733894348145,
1579
+ "learning_rate": 4.975552336110105e-06,
1580
+ "loss": 0.4471,
1581
+ "step": 280116
1582
+ },
1583
+ {
1584
+ "epoch": 93.0,
1585
+ "eval_cer": 0.03521634957036347,
1586
+ "eval_loss": 0.1251526027917862,
1587
+ "eval_runtime": 176.0621,
1588
+ "eval_samples_per_second": 206.461,
1589
+ "eval_steps_per_second": 6.452,
1590
+ "eval_wer": 0.1893743666365765,
1591
+ "step": 280116
1592
+ },
1593
+ {
1594
+ "epoch": 94.0,
1595
+ "grad_norm": 14.601805686950684,
1596
+ "learning_rate": 4.2687163802701646e-06,
1597
+ "loss": 0.4458,
1598
+ "step": 283128
1599
+ },
1600
+ {
1601
+ "epoch": 94.0,
1602
+ "eval_cer": 0.0351705021121924,
1603
+ "eval_loss": 0.1253127008676529,
1604
+ "eval_runtime": 177.0148,
1605
+ "eval_samples_per_second": 205.35,
1606
+ "eval_steps_per_second": 6.418,
1607
+ "eval_wer": 0.18914263776627205,
1608
+ "step": 283128
1609
+ },
1610
+ {
1611
+ "epoch": 95.0,
1612
+ "grad_norm": 3.8078722953796387,
1613
+ "learning_rate": 3.562115175660992e-06,
1614
+ "loss": 0.4449,
1615
+ "step": 286140
1616
+ },
1617
+ {
1618
+ "epoch": 95.0,
1619
+ "eval_cer": 0.035079265670431965,
1620
+ "eval_loss": 0.12475291639566422,
1621
+ "eval_runtime": 168.7285,
1622
+ "eval_samples_per_second": 215.435,
1623
+ "eval_steps_per_second": 6.733,
1624
+ "eval_wer": 0.1884266993162269,
1625
+ "step": 286140
1626
+ },
1627
+ {
1628
+ "epoch": 96.0,
1629
+ "grad_norm": 4.562527656555176,
1630
+ "learning_rate": 2.855044468590285e-06,
1631
+ "loss": 0.4434,
1632
+ "step": 289152
1633
+ },
1634
+ {
1635
+ "epoch": 96.0,
1636
+ "eval_cer": 0.035126030077766456,
1637
+ "eval_loss": 0.1246921494603157,
1638
+ "eval_runtime": 179.6539,
1639
+ "eval_samples_per_second": 202.333,
1640
+ "eval_steps_per_second": 6.323,
1641
+ "eval_wer": 0.18909421680829797,
1642
+ "step": 289152
1643
+ },
1644
+ {
1645
+ "epoch": 97.0,
1646
+ "grad_norm": 8.47780990600586,
1647
+ "learning_rate": 2.1482085127503454e-06,
1648
+ "loss": 0.4435,
1649
+ "step": 292164
1650
+ },
1651
+ {
1652
+ "epoch": 97.0,
1653
+ "eval_cer": 0.035188841095460825,
1654
+ "eval_loss": 0.12471602112054825,
1655
+ "eval_runtime": 152.6845,
1656
+ "eval_samples_per_second": 238.073,
1657
+ "eval_steps_per_second": 7.44,
1658
+ "eval_wer": 0.18912880320685088,
1659
+ "step": 292164
1660
+ },
1661
+ {
1662
+ "epoch": 98.0,
1663
+ "grad_norm": 16.65981101989746,
1664
+ "learning_rate": 1.4413725569104053e-06,
1665
+ "loss": 0.4444,
1666
+ "step": 295176
1667
+ },
1668
+ {
1669
+ "epoch": 98.0,
1670
+ "eval_cer": 0.03511594363696882,
1671
+ "eval_loss": 0.12448572367429733,
1672
+ "eval_runtime": 159.0007,
1673
+ "eval_samples_per_second": 228.615,
1674
+ "eval_steps_per_second": 7.145,
1675
+ "eval_wer": 0.1887518114626242,
1676
+ "step": 295176
1677
+ },
1678
+ {
1679
+ "epoch": 99.0,
1680
+ "grad_norm": 10.248809814453125,
1681
+ "learning_rate": 7.345366010704655e-07,
1682
+ "loss": 0.4429,
1683
+ "step": 298188
1684
+ },
1685
+ {
1686
+ "epoch": 99.0,
1687
+ "eval_cer": 0.03512648855234817,
1688
+ "eval_loss": 0.12444119900465012,
1689
+ "eval_runtime": 155.6325,
1690
+ "eval_samples_per_second": 233.563,
1691
+ "eval_steps_per_second": 7.299,
1692
+ "eval_wer": 0.18870684914450545,
1693
+ "step": 298188
1694
+ },
1695
+ {
1696
+ "epoch": 100.0,
1697
+ "grad_norm": 13.995579719543457,
1698
+ "learning_rate": 2.7935396461292875e-08,
1699
+ "loss": 0.4426,
1700
+ "step": 301200
1701
+ },
1702
+ {
1703
+ "epoch": 100.0,
1704
+ "eval_cer": 0.035049006348039057,
1705
+ "eval_loss": 0.12442902475595474,
1706
+ "eval_runtime": 173.9446,
1707
+ "eval_samples_per_second": 208.975,
1708
+ "eval_steps_per_second": 6.531,
1709
+ "eval_wer": 0.1883851956379634,
1710
+ "step": 301200
1711
+ }
1712
+ ],
1713
+ "logging_steps": 500,
1714
+ "max_steps": 301200,
1715
+ "num_input_tokens_seen": 0,
1716
+ "num_train_epochs": 100,
1717
+ "save_steps": 500,
1718
+ "stateful_callbacks": {
1719
+ "EarlyStoppingCallback": {
1720
+ "args": {
1721
+ "early_stopping_patience": 10,
1722
+ "early_stopping_threshold": 0.001
1723
+ },
1724
+ "attributes": {
1725
+ "early_stopping_patience_counter": 9
1726
+ }
1727
+ },
1728
+ "TrainerControl": {
1729
+ "args": {
1730
+ "should_epoch_stop": false,
1731
+ "should_evaluate": false,
1732
+ "should_log": false,
1733
+ "should_save": true,
1734
+ "should_training_stop": true
1735
+ },
1736
+ "attributes": {}
1737
+ }
1738
+ },
1739
+ "total_flos": 1.1064772388893474e+21,
1740
+ "train_batch_size": 64,
1741
+ "trial_name": null,
1742
+ "trial_params": null
1743
+ }
checkpoint-301200/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb00c22830dc27d32027f8d7edee3b94a76e24ae2a1d322a9afad8ebb4846cd3
3
+ size 5841
config.json ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_dropout": 0.0,
3
+ "adapter_attn_dim": null,
4
+ "adapter_kernel_size": 3,
5
+ "adapter_stride": 2,
6
+ "add_adapter": false,
7
+ "apply_spec_augment": true,
8
+ "architectures": [
9
+ "Wav2Vec2ForCTC"
10
+ ],
11
+ "attention_dropout": 0.0,
12
+ "bos_token_id": 1,
13
+ "classifier_proj_size": 256,
14
+ "codevector_dim": 256,
15
+ "contrastive_logits_temperature": 0.1,
16
+ "conv_bias": true,
17
+ "conv_dim": [
18
+ 512,
19
+ 512,
20
+ 512,
21
+ 512,
22
+ 512,
23
+ 512,
24
+ 512
25
+ ],
26
+ "conv_kernel": [
27
+ 10,
28
+ 3,
29
+ 3,
30
+ 3,
31
+ 3,
32
+ 2,
33
+ 2
34
+ ],
35
+ "conv_stride": [
36
+ 5,
37
+ 2,
38
+ 2,
39
+ 2,
40
+ 2,
41
+ 2,
42
+ 2
43
+ ],
44
+ "ctc_loss_reduction": "mean",
45
+ "ctc_zero_infinity": false,
46
+ "diversity_loss_weight": 0.1,
47
+ "do_stable_layer_norm": true,
48
+ "eos_token_id": 2,
49
+ "feat_extract_activation": "gelu",
50
+ "feat_extract_dropout": 0.0,
51
+ "feat_extract_norm": "layer",
52
+ "feat_proj_dropout": 0.0,
53
+ "feat_quantizer_dropout": 0.0,
54
+ "final_dropout": 0.0,
55
+ "hidden_act": "gelu",
56
+ "hidden_dropout": 0.0,
57
+ "hidden_dropout_prob": 0.0,
58
+ "hidden_size": 768,
59
+ "initializer_range": 0.02,
60
+ "intermediate_size": 3072,
61
+ "layer_norm_eps": 1e-05,
62
+ "layerdrop": 0.0,
63
+ "mask_feature_length": 10,
64
+ "mask_feature_min_masks": 0,
65
+ "mask_feature_prob": 0.0,
66
+ "mask_time_length": 10,
67
+ "mask_time_min_masks": 2,
68
+ "mask_time_prob": 0.65,
69
+ "model_type": "wav2vec2",
70
+ "num_adapter_layers": 3,
71
+ "num_attention_heads": 12,
72
+ "num_codevector_groups": 2,
73
+ "num_codevectors_per_group": 320,
74
+ "num_conv_pos_embedding_groups": 16,
75
+ "num_conv_pos_embeddings": 128,
76
+ "num_feat_extract_layers": 7,
77
+ "num_hidden_layers": 12,
78
+ "num_negatives": 100,
79
+ "output_hidden_size": 768,
80
+ "pad_token_id": 28,
81
+ "proj_codevector_dim": 256,
82
+ "tdnn_dilation": [
83
+ 1,
84
+ 2,
85
+ 3,
86
+ 1,
87
+ 1
88
+ ],
89
+ "tdnn_dim": [
90
+ 512,
91
+ 512,
92
+ 512,
93
+ 512,
94
+ 1500
95
+ ],
96
+ "tdnn_kernel": [
97
+ 5,
98
+ 3,
99
+ 3,
100
+ 1,
101
+ 1
102
+ ],
103
+ "torch_dtype": "float32",
104
+ "transformers_version": "4.51.3",
105
+ "use_weighted_layer_sum": false,
106
+ "vocab_size": 32,
107
+ "xvector_output_dim": 512
108
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c2dc0ddc9facfeee4821b46da0115ab606db65b8b0961efa7222decd8a9a1b8f
3
+ size 377652400
preprocessor_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_normalize": true,
3
+ "feature_extractor_type": "Wav2Vec2FeatureExtractor",
4
+ "feature_size": 1,
5
+ "padding_side": "right",
6
+ "padding_value": 0.0,
7
+ "processor_class": "Wav2Vec2Processor",
8
+ "return_attention_mask": false,
9
+ "sampling_rate": 16000
10
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "eos_token": "</s>",
4
+ "pad_token": "[PAD]",
5
+ "unk_token": "[UNK]"
6
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "27": {
4
+ "content": "[UNK]",
5
+ "lstrip": true,
6
+ "normalized": false,
7
+ "rstrip": true,
8
+ "single_word": false,
9
+ "special": false
10
+ },
11
+ "28": {
12
+ "content": "[PAD]",
13
+ "lstrip": true,
14
+ "normalized": false,
15
+ "rstrip": true,
16
+ "single_word": false,
17
+ "special": false
18
+ },
19
+ "29": {
20
+ "content": "<s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "30": {
28
+ "content": "</s>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ }
35
+ },
36
+ "bos_token": "<s>",
37
+ "clean_up_tokenization_spaces": false,
38
+ "do_lower_case": false,
39
+ "eos_token": "</s>",
40
+ "extra_special_tokens": {},
41
+ "model_max_length": 1000000000000000019884624838656,
42
+ "pad_token": "[PAD]",
43
+ "processor_class": "Wav2Vec2Processor",
44
+ "replace_word_delimiter_char": " ",
45
+ "target_lang": null,
46
+ "tokenizer_class": "Wav2Vec2CTCTokenizer",
47
+ "unk_token": "[UNK]",
48
+ "word_delimiter_token": "|"
49
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb00c22830dc27d32027f8d7edee3b94a76e24ae2a1d322a9afad8ebb4846cd3
3
+ size 5841
vocab.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "'": 1,
3
+ "[PAD]": 28,
4
+ "[UNK]": 27,
5
+ "a": 2,
6
+ "b": 3,
7
+ "c": 4,
8
+ "d": 5,
9
+ "e": 6,
10
+ "f": 7,
11
+ "g": 8,
12
+ "h": 9,
13
+ "i": 10,
14
+ "j": 11,
15
+ "k": 12,
16
+ "l": 13,
17
+ "m": 14,
18
+ "n": 15,
19
+ "o": 16,
20
+ "p": 17,
21
+ "r": 18,
22
+ "s": 19,
23
+ "t": 20,
24
+ "u": 21,
25
+ "v": 22,
26
+ "w": 23,
27
+ "y": 24,
28
+ "z": 25,
29
+ "|": 0,
30
+ "ŋ": 26
31
+ }