alexue4 commited on
Commit
11562f2
1 Parent(s): 25a79b1

End of training

Browse files
Files changed (4) hide show
  1. README.md +107 -15
  2. pytorch_model.bin +1 -1
  3. trainer_state.json +1063 -1063
  4. training_args.bin +1 -1
README.md CHANGED
@@ -1,27 +1,119 @@
1
  ---
2
  license: mit
3
- language:
4
- - ru
5
- library_name: transformers
6
  tags:
7
- - text-generation-inference
 
 
 
8
  ---
9
 
10
- # text-normalization-ru-new
11
- Normalization for Russian text. Couldn't find any existing solutions (besides algorithms, don't like those) so made this.
12
- It was designed for Silero TTS model which cant handle english and numbers for russian text to speach.
13
 
14
- This model is a fine-tuned version of [cointegrated/rut5-small](https://huggingface.co/cointegrated/rut5-small) on https://www.kaggle.com/c/text-normalization-challenge-russian-language and additional dataset prepared by me using typical messages.
15
 
 
16
  It achieves the following results on the evaluation set:
17
- - Loss: 0.0177
18
  - Mean Distance: 0
19
- - Max Distance: 15
20
 
21
  ## Model description
22
 
23
- Tiny T5 trained from scratch for normalizing Russian texts:
24
- - translating numbers into words
25
- - expanding abbreviations into phonetic letter combinations
26
- - transliterating english into russian letters
27
- - whatever else was in the dataset (see below)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ base_model: cointegrated/rut5-small
 
 
4
  tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: text-normalization-ru-new
8
+ results: []
9
  ---
10
 
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
 
13
 
14
+ # text-normalization-ru-new
15
 
16
+ This model is a fine-tuned version of [cointegrated/rut5-small](https://huggingface.co/cointegrated/rut5-small) on an unknown dataset.
17
  It achieves the following results on the evaluation set:
18
+ - Loss: 0.0318
19
  - Mean Distance: 0
20
+ - Max Distance: 11
21
 
22
  ## Model description
23
 
24
+ More information needed
25
+
26
+ ## Intended uses & limitations
27
+
28
+ More information needed
29
+
30
+ ## Training and evaluation data
31
+
32
+ More information needed
33
+
34
+ ## Training procedure
35
+
36
+ ### Training hyperparameters
37
+
38
+ The following hyperparameters were used during training:
39
+ - learning_rate: 0.001
40
+ - train_batch_size: 30
41
+ - eval_batch_size: 30
42
+ - seed: 42
43
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
44
+ - lr_scheduler_type: linear
45
+ - lr_scheduler_warmup_ratio: 0.1
46
+ - num_epochs: 60
47
+
48
+ ### Training results
49
+
50
+ | Training Loss | Epoch | Step | Validation Loss | Mean Distance | Max Distance |
51
+ |:-------------:|:-----:|:------:|:---------------:|:-------------:|:------------:|
52
+ | 0.2251 | 1.0 | 3334 | 0.1190 | 3 | 29 |
53
+ | 0.1179 | 2.0 | 6668 | 0.0574 | 2 | 31 |
54
+ | 0.0848 | 3.0 | 10002 | 0.0436 | 1 | 15 |
55
+ | 0.0618 | 4.0 | 13336 | 0.0359 | 1 | 20 |
56
+ | 0.0532 | 5.0 | 16670 | 0.0315 | 0 | 11 |
57
+ | 0.0446 | 6.0 | 20004 | 0.0299 | 0 | 16 |
58
+ | 0.0388 | 7.0 | 23338 | 0.0295 | 0 | 15 |
59
+ | 0.0311 | 8.0 | 26672 | 0.0287 | 0 | 15 |
60
+ | 0.0269 | 9.0 | 30006 | 0.0241 | 0 | 15 |
61
+ | 0.0232 | 10.0 | 33340 | 0.0228 | 0 | 13 |
62
+ | 0.0203 | 11.0 | 36674 | 0.0243 | 0 | 16 |
63
+ | 0.0173 | 12.0 | 40008 | 0.0250 | 0 | 15 |
64
+ | 0.0151 | 13.0 | 43342 | 0.0244 | 0 | 9 |
65
+ | 0.0136 | 14.0 | 46676 | 0.0234 | 0 | 15 |
66
+ | 0.0123 | 15.0 | 50010 | 0.0221 | 0 | 9 |
67
+ | 0.0113 | 16.0 | 53344 | 0.0244 | 0 | 12 |
68
+ | 0.01 | 17.0 | 56678 | 0.0226 | 0 | 13 |
69
+ | 0.0089 | 18.0 | 60012 | 0.0271 | 0 | 13 |
70
+ | 0.0085 | 19.0 | 63346 | 0.0248 | 0 | 13 |
71
+ | 0.0074 | 20.0 | 66680 | 0.0277 | 0 | 12 |
72
+ | 0.007 | 21.0 | 70014 | 0.0309 | 0 | 13 |
73
+ | 0.0066 | 22.0 | 73348 | 0.0306 | 0 | 11 |
74
+ | 0.0056 | 23.0 | 76682 | 0.0287 | 0 | 10 |
75
+ | 0.0053 | 24.0 | 80016 | 0.0312 | 0 | 12 |
76
+ | 0.0049 | 25.0 | 83350 | 0.0276 | 0 | 11 |
77
+ | 0.0053 | 26.0 | 86684 | 0.0308 | 0 | 10 |
78
+ | 0.0041 | 27.0 | 90018 | 0.0279 | 0 | 10 |
79
+ | 0.0041 | 28.0 | 93352 | 0.0292 | 0 | 11 |
80
+ | 0.0037 | 29.0 | 96686 | 0.0306 | 0 | 11 |
81
+ | 0.0035 | 30.0 | 100020 | 0.0272 | 0 | 12 |
82
+ | 0.0032 | 31.0 | 103354 | 0.0255 | 0 | 9 |
83
+ | 0.0031 | 32.0 | 106688 | 0.0293 | 0 | 10 |
84
+ | 0.0029 | 33.0 | 110022 | 0.0300 | 0 | 13 |
85
+ | 0.0026 | 34.0 | 113356 | 0.0305 | 0 | 11 |
86
+ | 0.0024 | 35.0 | 116690 | 0.0273 | 0 | 9 |
87
+ | 0.0023 | 36.0 | 120024 | 0.0284 | 0 | 10 |
88
+ | 0.0022 | 37.0 | 123358 | 0.0313 | 0 | 13 |
89
+ | 0.002 | 38.0 | 126692 | 0.0341 | 0 | 13 |
90
+ | 0.0017 | 39.0 | 130026 | 0.0301 | 0 | 13 |
91
+ | 0.0017 | 40.0 | 133360 | 0.0330 | 0 | 11 |
92
+ | 0.0016 | 41.0 | 136694 | 0.0344 | 0 | 11 |
93
+ | 0.0014 | 42.0 | 140028 | 0.0337 | 0 | 10 |
94
+ | 0.0013 | 43.0 | 143362 | 0.0292 | 0 | 12 |
95
+ | 0.0012 | 44.0 | 146696 | 0.0339 | 0 | 11 |
96
+ | 0.0012 | 45.0 | 150030 | 0.0330 | 0 | 11 |
97
+ | 0.001 | 46.0 | 153364 | 0.0307 | 0 | 11 |
98
+ | 0.001 | 47.0 | 156698 | 0.0330 | 0 | 10 |
99
+ | 0.0009 | 48.0 | 160032 | 0.0338 | 0 | 11 |
100
+ | 0.0009 | 49.0 | 163366 | 0.0288 | 0 | 10 |
101
+ | 0.0008 | 50.0 | 166700 | 0.0256 | 0 | 10 |
102
+ | 0.0007 | 51.0 | 170034 | 0.0284 | 0 | 11 |
103
+ | 0.0006 | 52.0 | 173368 | 0.0342 | 0 | 10 |
104
+ | 0.0006 | 53.0 | 176702 | 0.0312 | 0 | 10 |
105
+ | 0.0005 | 54.0 | 180036 | 0.0326 | 0 | 10 |
106
+ | 0.0005 | 55.0 | 183370 | 0.0304 | 0 | 11 |
107
+ | 0.0005 | 56.0 | 186704 | 0.0300 | 0 | 11 |
108
+ | 0.0004 | 57.0 | 190038 | 0.0313 | 0 | 11 |
109
+ | 0.0003 | 58.0 | 193372 | 0.0321 | 0 | 11 |
110
+ | 0.0003 | 59.0 | 196706 | 0.0316 | 0 | 10 |
111
+ | 0.0004 | 60.0 | 200040 | 0.0318 | 0 | 11 |
112
+
113
+
114
+ ### Framework versions
115
+
116
+ - Transformers 4.32.1
117
+ - Pytorch 2.0.1+cu117
118
+ - Datasets 2.14.4
119
+ - Tokenizers 0.13.3
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:67c7f076dd09ccc14ee16c69cabf6f1ca5b674bd9bd1bf502d509b46230e8f17
3
  size 258643461
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:640ab93c6e6932ab1eb56e93439e8e20cf9ed1484ccd6ca0aa7250c2acf8ab00
3
  size 258643461
trainer_state.json CHANGED
@@ -3,1826 +3,1826 @@
3
  "best_model_checkpoint": null,
4
  "epoch": 60.0,
5
  "eval_steps": 500,
6
- "global_step": 197880,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
  "epoch": 0.0,
13
- "learning_rate": 5.0535678188801295e-08,
14
- "loss": 12.7149,
15
  "step": 1
16
  },
17
  {
18
  "epoch": 0.3,
19
- "learning_rate": 5.0030321406913285e-05,
20
- "loss": 3.3584,
21
- "step": 990
22
  },
23
  {
24
  "epoch": 0.6,
25
- "learning_rate": 0.00010006064281382657,
26
- "loss": 0.3384,
27
- "step": 1980
28
  },
29
  {
30
  "epoch": 0.9,
31
- "learning_rate": 0.00015009096422073984,
32
- "loss": 0.2236,
33
- "step": 2970
34
  },
35
  {
36
  "epoch": 1.0,
37
- "eval_loss": 0.11203870922327042,
38
- "eval_max_distance": 133,
39
- "eval_mean_distance": 5,
40
- "eval_runtime": 0.5965,
41
- "eval_samples_per_second": 83.828,
42
- "eval_steps_per_second": 3.353,
43
- "step": 3298
44
  },
45
  {
46
  "epoch": 1.2,
47
- "learning_rate": 0.00020012128562765314,
48
- "loss": 0.1679,
49
- "step": 3960
50
  },
51
  {
52
  "epoch": 1.5,
53
- "learning_rate": 0.0002501516070345664,
54
- "loss": 0.1395,
55
- "step": 4950
56
  },
57
  {
58
  "epoch": 1.8,
59
- "learning_rate": 0.0003001819284414797,
60
  "loss": 0.1179,
61
- "step": 5940
62
  },
63
  {
64
  "epoch": 2.0,
65
- "eval_loss": 0.05475025996565819,
66
- "eval_max_distance": 82,
67
- "eval_mean_distance": 3,
68
- "eval_runtime": 0.5422,
69
- "eval_samples_per_second": 92.223,
70
- "eval_steps_per_second": 3.689,
71
- "step": 6596
72
  },
73
  {
74
  "epoch": 2.1,
75
- "learning_rate": 0.0003502122498483929,
76
- "loss": 0.1022,
77
- "step": 6930
78
  },
79
  {
80
  "epoch": 2.4,
81
- "learning_rate": 0.0004002425712553063,
82
- "loss": 0.0917,
83
- "step": 7920
84
  },
85
  {
86
  "epoch": 2.7,
87
- "learning_rate": 0.0004502728926622195,
88
- "loss": 0.0829,
89
- "step": 8910
90
  },
91
  {
92
  "epoch": 3.0,
93
- "eval_loss": 0.042510777711868286,
94
- "eval_max_distance": 46,
95
  "eval_mean_distance": 1,
96
- "eval_runtime": 0.5158,
97
- "eval_samples_per_second": 96.928,
98
- "eval_steps_per_second": 3.877,
99
- "step": 9894
100
  },
101
  {
102
  "epoch": 3.0,
103
- "learning_rate": 0.0005003032140691328,
104
- "loss": 0.0769,
105
- "step": 9900
106
  },
107
  {
108
  "epoch": 3.3,
109
- "learning_rate": 0.0005503335354760462,
110
- "loss": 0.0667,
111
- "step": 10890
112
  },
113
  {
114
  "epoch": 3.6,
115
- "learning_rate": 0.0006003638568829594,
116
  "loss": 0.0653,
117
- "step": 11880
118
  },
119
  {
120
  "epoch": 3.9,
121
- "learning_rate": 0.0006503941782898727,
122
- "loss": 0.0643,
123
- "step": 12870
124
  },
125
  {
126
  "epoch": 4.0,
127
- "eval_loss": 0.03110930137336254,
128
- "eval_max_distance": 64,
129
  "eval_mean_distance": 1,
130
- "eval_runtime": 0.4848,
131
- "eval_samples_per_second": 103.129,
132
- "eval_steps_per_second": 4.125,
133
- "step": 13192
134
  },
135
  {
136
  "epoch": 4.2,
137
- "learning_rate": 0.0007004244996967858,
138
- "loss": 0.0589,
139
- "step": 13860
140
  },
141
  {
142
  "epoch": 4.5,
143
- "learning_rate": 0.0007504548211036993,
144
- "loss": 0.0549,
145
- "step": 14850
146
  },
147
  {
148
  "epoch": 4.8,
149
- "learning_rate": 0.0008004851425106126,
150
- "loss": 0.0538,
151
- "step": 15840
152
  },
153
  {
154
  "epoch": 5.0,
155
- "eval_loss": 0.026651622727513313,
156
- "eval_max_distance": 48,
157
- "eval_mean_distance": 1,
158
- "eval_runtime": 0.5057,
159
- "eval_samples_per_second": 98.878,
160
- "eval_steps_per_second": 3.955,
161
- "step": 16490
162
  },
163
  {
164
  "epoch": 5.1,
165
- "learning_rate": 0.0008505154639175257,
166
- "loss": 0.048,
167
- "step": 16830
168
  },
169
  {
170
  "epoch": 5.4,
171
- "learning_rate": 0.000900545785324439,
172
- "loss": 0.0461,
173
- "step": 17820
174
  },
175
  {
176
  "epoch": 5.7,
177
- "learning_rate": 0.0009505761067313523,
178
- "loss": 0.0469,
179
- "step": 18810
180
  },
181
  {
182
  "epoch": 6.0,
183
- "eval_loss": 0.039574604481458664,
184
- "eval_max_distance": 80,
185
- "eval_mean_distance": 2,
186
- "eval_runtime": 0.5179,
187
- "eval_samples_per_second": 96.548,
188
- "eval_steps_per_second": 3.862,
189
- "step": 19788
190
  },
191
  {
192
  "epoch": 6.0,
193
- "learning_rate": 0.0009999326190957482,
194
- "loss": 0.0464,
195
- "step": 19800
196
  },
197
  {
198
- "epoch": 6.3,
199
- "learning_rate": 0.0009943736944949802,
200
- "loss": 0.0393,
201
- "step": 20790
202
  },
203
  {
204
- "epoch": 6.6,
205
- "learning_rate": 0.000988814769894212,
206
- "loss": 0.0426,
207
- "step": 21780
208
  },
209
  {
210
- "epoch": 6.9,
211
- "learning_rate": 0.000983255845293444,
212
- "loss": 0.0385,
213
- "step": 22770
214
  },
215
  {
216
  "epoch": 7.0,
217
- "eval_loss": 0.026188833639025688,
218
- "eval_max_distance": 73,
219
- "eval_mean_distance": 2,
220
- "eval_runtime": 0.4896,
221
- "eval_samples_per_second": 102.115,
222
- "eval_steps_per_second": 4.085,
223
- "step": 23086
224
  },
225
  {
226
- "epoch": 7.2,
227
- "learning_rate": 0.0009776969206926756,
228
- "loss": 0.034,
229
- "step": 23760
230
  },
231
  {
232
- "epoch": 7.5,
233
- "learning_rate": 0.0009721379960919076,
234
- "loss": 0.0315,
235
- "step": 24750
236
  },
237
  {
238
- "epoch": 7.8,
239
- "learning_rate": 0.0009665790714911395,
240
- "loss": 0.0316,
241
- "step": 25740
242
  },
243
  {
244
  "epoch": 8.0,
245
- "eval_loss": 0.02234221063554287,
246
- "eval_max_distance": 40,
247
- "eval_mean_distance": 1,
248
- "eval_runtime": 0.4837,
249
- "eval_samples_per_second": 103.365,
250
- "eval_steps_per_second": 4.135,
251
- "step": 26384
252
  },
253
  {
254
- "epoch": 8.1,
255
- "learning_rate": 0.0009610201468903713,
256
- "loss": 0.0305,
257
- "step": 26730
258
  },
259
  {
260
  "epoch": 8.41,
261
- "learning_rate": 0.0009554612222896032,
262
- "loss": 0.0271,
263
- "step": 27720
264
  },
265
  {
266
  "epoch": 8.71,
267
- "learning_rate": 0.0009499022976888349,
268
- "loss": 0.0263,
269
- "step": 28710
270
  },
271
  {
272
  "epoch": 9.0,
273
- "eval_loss": 0.023996921256184578,
274
- "eval_max_distance": 69,
275
- "eval_mean_distance": 1,
276
- "eval_runtime": 0.4894,
277
- "eval_samples_per_second": 102.167,
278
- "eval_steps_per_second": 4.087,
279
- "step": 29682
280
  },
281
  {
282
  "epoch": 9.01,
283
- "learning_rate": 0.0009443433730880669,
284
- "loss": 0.0282,
285
- "step": 29700
286
  },
287
  {
288
  "epoch": 9.31,
289
- "learning_rate": 0.0009387844484872987,
290
- "loss": 0.0229,
291
- "step": 30690
292
  },
293
  {
294
  "epoch": 9.61,
295
- "learning_rate": 0.0009332255238865306,
296
- "loss": 0.0226,
297
- "step": 31680
298
  },
299
  {
300
  "epoch": 9.91,
301
- "learning_rate": 0.0009276665992857625,
302
- "loss": 0.0226,
303
- "step": 32670
304
  },
305
  {
306
  "epoch": 10.0,
307
- "eval_loss": 0.02030733972787857,
308
- "eval_max_distance": 60,
309
- "eval_mean_distance": 1,
310
- "eval_runtime": 0.4797,
311
- "eval_samples_per_second": 104.236,
312
- "eval_steps_per_second": 4.169,
313
- "step": 32980
314
  },
315
  {
316
  "epoch": 10.21,
317
- "learning_rate": 0.0009221076746849943,
318
- "loss": 0.0209,
319
- "step": 33660
320
  },
321
  {
322
  "epoch": 10.51,
323
- "learning_rate": 0.0009165487500842261,
324
- "loss": 0.02,
325
- "step": 34650
326
  },
327
  {
328
  "epoch": 10.81,
329
- "learning_rate": 0.000910989825483458,
330
  "loss": 0.0203,
331
- "step": 35640
332
  },
333
  {
334
  "epoch": 11.0,
335
- "eval_loss": 0.017732510343194008,
336
- "eval_max_distance": 54,
337
- "eval_mean_distance": 1,
338
- "eval_runtime": 0.4814,
339
- "eval_samples_per_second": 103.858,
340
- "eval_steps_per_second": 4.154,
341
- "step": 36278
342
  },
343
  {
344
  "epoch": 11.11,
345
- "learning_rate": 0.0009054309008826899,
346
- "loss": 0.0183,
347
- "step": 36630
348
  },
349
  {
350
  "epoch": 11.41,
351
- "learning_rate": 0.0008998719762819217,
352
- "loss": 0.0174,
353
- "step": 37620
354
  },
355
  {
356
  "epoch": 11.71,
357
- "learning_rate": 0.0008943130516811536,
358
- "loss": 0.0178,
359
- "step": 38610
360
  },
361
  {
362
  "epoch": 12.0,
363
- "eval_loss": 0.018777821213006973,
364
- "eval_max_distance": 61,
365
- "eval_mean_distance": 1,
366
- "eval_runtime": 0.4893,
367
- "eval_samples_per_second": 102.185,
368
- "eval_steps_per_second": 4.087,
369
- "step": 39576
370
  },
371
  {
372
  "epoch": 12.01,
373
- "learning_rate": 0.0008887541270803853,
374
- "loss": 0.0174,
375
- "step": 39600
376
  },
377
  {
378
  "epoch": 12.31,
379
- "learning_rate": 0.0008831952024796173,
380
- "loss": 0.0153,
381
- "step": 40590
382
  },
383
  {
384
  "epoch": 12.61,
385
- "learning_rate": 0.0008776362778788492,
386
- "loss": 0.015,
387
- "step": 41580
388
  },
389
  {
390
  "epoch": 12.91,
391
- "learning_rate": 0.000872077353278081,
392
- "loss": 0.0154,
393
- "step": 42570
394
  },
395
  {
396
  "epoch": 13.0,
397
- "eval_loss": 0.029613599181175232,
398
- "eval_max_distance": 65,
399
- "eval_mean_distance": 1,
400
- "eval_runtime": 0.4669,
401
- "eval_samples_per_second": 107.079,
402
- "eval_steps_per_second": 4.283,
403
- "step": 42874
404
  },
405
  {
406
  "epoch": 13.21,
407
- "learning_rate": 0.0008665184286773129,
408
- "loss": 0.014,
409
- "step": 43560
410
  },
411
  {
412
  "epoch": 13.51,
413
- "learning_rate": 0.0008609595040765447,
414
- "loss": 0.0135,
415
- "step": 44550
416
  },
417
  {
418
  "epoch": 13.81,
419
- "learning_rate": 0.0008554005794757766,
420
- "loss": 0.0138,
421
- "step": 45540
422
  },
423
  {
424
  "epoch": 14.0,
425
- "eval_loss": 0.02011469565331936,
426
- "eval_max_distance": 55,
427
- "eval_mean_distance": 1,
428
- "eval_runtime": 0.5034,
429
- "eval_samples_per_second": 99.332,
430
- "eval_steps_per_second": 3.973,
431
- "step": 46172
432
  },
433
  {
434
  "epoch": 14.11,
435
- "learning_rate": 0.0008498416548750084,
436
- "loss": 0.0128,
437
- "step": 46530
438
  },
439
  {
440
  "epoch": 14.41,
441
- "learning_rate": 0.0008442827302742403,
442
  "loss": 0.0121,
443
- "step": 47520
444
  },
445
  {
446
  "epoch": 14.71,
447
- "learning_rate": 0.0008387238056734722,
448
- "loss": 0.012,
449
- "step": 48510
450
  },
451
  {
452
  "epoch": 15.0,
453
- "eval_loss": 0.026753582060337067,
454
- "eval_max_distance": 67,
455
- "eval_mean_distance": 1,
456
- "eval_runtime": 0.4716,
457
- "eval_samples_per_second": 106.031,
458
- "eval_steps_per_second": 4.241,
459
- "step": 49470
460
  },
461
  {
462
  "epoch": 15.01,
463
- "learning_rate": 0.000833164881072704,
464
- "loss": 0.0123,
465
- "step": 49500
466
  },
467
  {
468
  "epoch": 15.31,
469
- "learning_rate": 0.0008276059564719359,
470
- "loss": 0.0104,
471
- "step": 50490
472
  },
473
  {
474
  "epoch": 15.61,
475
- "learning_rate": 0.0008220470318711677,
476
- "loss": 0.0109,
477
- "step": 51480
478
  },
479
  {
480
  "epoch": 15.91,
481
- "learning_rate": 0.0008164881072703996,
482
- "loss": 0.0109,
483
- "step": 52470
484
  },
485
  {
486
  "epoch": 16.0,
487
- "eval_loss": 0.01633359119296074,
488
- "eval_max_distance": 35,
489
- "eval_mean_distance": 1,
490
- "eval_runtime": 0.4971,
491
- "eval_samples_per_second": 100.579,
492
- "eval_steps_per_second": 4.023,
493
- "step": 52768
494
  },
495
  {
496
  "epoch": 16.21,
497
- "learning_rate": 0.0008109291826696314,
498
- "loss": 0.0098,
499
- "step": 53460
500
  },
501
  {
502
  "epoch": 16.51,
503
- "learning_rate": 0.0008053702580688633,
504
- "loss": 0.0094,
505
- "step": 54450
506
  },
507
  {
508
  "epoch": 16.81,
509
- "learning_rate": 0.0007998113334680952,
510
- "loss": 0.0105,
511
- "step": 55440
512
  },
513
  {
514
  "epoch": 17.0,
515
- "eval_loss": 0.013592842034995556,
516
- "eval_max_distance": 26,
517
- "eval_mean_distance": 1,
518
- "eval_runtime": 0.48,
519
- "eval_samples_per_second": 104.157,
520
- "eval_steps_per_second": 4.166,
521
- "step": 56066
522
  },
523
  {
524
  "epoch": 17.11,
525
- "learning_rate": 0.000794252408867327,
526
- "loss": 0.0097,
527
- "step": 56430
528
  },
529
  {
530
  "epoch": 17.41,
531
- "learning_rate": 0.0007886934842665589,
532
- "loss": 0.0083,
533
- "step": 57420
534
  },
535
  {
536
  "epoch": 17.71,
537
- "learning_rate": 0.0007831345596657907,
538
- "loss": 0.0092,
539
- "step": 58410
540
  },
541
  {
542
  "epoch": 18.0,
543
- "eval_loss": 0.020196767523884773,
544
- "eval_max_distance": 65,
545
- "eval_mean_distance": 1,
546
- "eval_runtime": 0.4567,
547
- "eval_samples_per_second": 109.487,
548
- "eval_steps_per_second": 4.379,
549
- "step": 59364
550
  },
551
  {
552
  "epoch": 18.01,
553
- "learning_rate": 0.0007775756350650226,
554
- "loss": 0.009,
555
- "step": 59400
556
  },
557
  {
558
  "epoch": 18.31,
559
- "learning_rate": 0.0007720167104642545,
560
  "loss": 0.0075,
561
- "step": 60390
562
  },
563
  {
564
  "epoch": 18.61,
565
- "learning_rate": 0.0007664577858634864,
566
- "loss": 0.0078,
567
- "step": 61380
568
  },
569
  {
570
- "epoch": 18.91,
571
- "learning_rate": 0.0007608988612627181,
572
- "loss": 0.0087,
573
- "step": 62370
574
  },
575
  {
576
  "epoch": 19.0,
577
- "eval_loss": 0.02213277295231819,
578
- "eval_max_distance": 65,
579
- "eval_mean_distance": 1,
580
- "eval_runtime": 0.4707,
581
- "eval_samples_per_second": 106.233,
582
- "eval_steps_per_second": 4.249,
583
- "step": 62662
584
  },
585
  {
586
- "epoch": 19.21,
587
- "learning_rate": 0.00075533993666195,
588
- "loss": 0.0077,
589
- "step": 63360
590
  },
591
  {
592
- "epoch": 19.51,
593
- "learning_rate": 0.0007497810120611818,
594
- "loss": 0.0071,
595
- "step": 64350
596
  },
597
  {
598
- "epoch": 19.81,
599
- "learning_rate": 0.0007442220874604138,
600
- "loss": 0.0075,
601
- "step": 65340
602
  },
603
  {
604
  "epoch": 20.0,
605
- "eval_loss": 0.020336275920271873,
606
- "eval_max_distance": 33,
607
- "eval_mean_distance": 1,
608
- "eval_runtime": 0.4773,
609
- "eval_samples_per_second": 104.749,
610
- "eval_steps_per_second": 4.19,
611
- "step": 65960
612
  },
613
  {
614
- "epoch": 20.11,
615
- "learning_rate": 0.0007386631628596457,
616
- "loss": 0.0073,
617
- "step": 66330
618
  },
619
  {
620
- "epoch": 20.41,
621
- "learning_rate": 0.0007331042382588774,
622
- "loss": 0.0063,
623
- "step": 67320
624
  },
625
  {
626
- "epoch": 20.71,
627
- "learning_rate": 0.0007275453136581093,
628
- "loss": 0.0067,
629
- "step": 68310
630
  },
631
  {
632
  "epoch": 21.0,
633
- "eval_loss": 0.022562623023986816,
634
- "eval_max_distance": 26,
635
- "eval_mean_distance": 1,
636
- "eval_runtime": 0.5033,
637
- "eval_samples_per_second": 99.35,
638
- "eval_steps_per_second": 3.974,
639
- "step": 69258
640
  },
641
  {
642
- "epoch": 21.01,
643
- "learning_rate": 0.0007219863890573411,
644
- "loss": 0.007,
645
- "step": 69300
646
  },
647
  {
648
- "epoch": 21.31,
649
- "learning_rate": 0.000716427464456573,
650
- "loss": 0.0061,
651
- "step": 70290
652
  },
653
  {
654
- "epoch": 21.61,
655
- "learning_rate": 0.0007108685398558049,
656
- "loss": 0.006,
657
- "step": 71280
658
  },
659
  {
660
- "epoch": 21.91,
661
- "learning_rate": 0.0007053096152550368,
662
- "loss": 0.0062,
663
- "step": 72270
664
  },
665
  {
666
  "epoch": 22.0,
667
- "eval_loss": 0.01839238964021206,
668
- "eval_max_distance": 24,
669
- "eval_mean_distance": 1,
670
- "eval_runtime": 0.4856,
671
- "eval_samples_per_second": 102.959,
672
- "eval_steps_per_second": 4.118,
673
- "step": 72556
674
  },
675
  {
676
- "epoch": 22.21,
677
- "learning_rate": 0.0006997506906542685,
678
- "loss": 0.0057,
679
- "step": 73260
680
  },
681
  {
682
- "epoch": 22.51,
683
- "learning_rate": 0.0006941917660535004,
684
- "loss": 0.0058,
685
- "step": 74250
686
  },
687
  {
688
- "epoch": 22.81,
689
- "learning_rate": 0.0006886328414527323,
690
- "loss": 0.0059,
691
- "step": 75240
692
  },
693
  {
694
  "epoch": 23.0,
695
- "eval_loss": 0.013111269101500511,
696
- "eval_max_distance": 18,
697
  "eval_mean_distance": 0,
698
- "eval_runtime": 0.5001,
699
- "eval_samples_per_second": 99.983,
700
- "eval_steps_per_second": 3.999,
701
- "step": 75854
702
  },
703
  {
704
- "epoch": 23.11,
705
- "learning_rate": 0.0006830739168519642,
706
- "loss": 0.0055,
707
- "step": 76230
708
  },
709
  {
710
- "epoch": 23.41,
711
- "learning_rate": 0.0006775149922511961,
712
- "loss": 0.0051,
713
- "step": 77220
714
  },
715
  {
716
- "epoch": 23.71,
717
- "learning_rate": 0.0006719560676504279,
718
- "loss": 0.0054,
719
- "step": 78210
720
  },
721
  {
722
  "epoch": 24.0,
723
- "eval_loss": 0.026959825307130814,
724
- "eval_max_distance": 58,
725
- "eval_mean_distance": 1,
726
- "eval_runtime": 0.4725,
727
- "eval_samples_per_second": 105.825,
728
- "eval_steps_per_second": 4.233,
729
- "step": 79152
730
  },
731
  {
732
- "epoch": 24.01,
733
- "learning_rate": 0.0006663971430496597,
734
- "loss": 0.0055,
735
- "step": 79200
736
  },
737
  {
738
- "epoch": 24.31,
739
- "learning_rate": 0.0006608382184488915,
740
- "loss": 0.0046,
741
- "step": 80190
742
  },
743
  {
744
- "epoch": 24.61,
745
- "learning_rate": 0.0006552792938481235,
746
- "loss": 0.005,
747
- "step": 81180
748
  },
749
  {
750
  "epoch": 24.92,
751
- "learning_rate": 0.0006497203692473554,
752
- "loss": 0.0052,
753
- "step": 82170
754
  },
755
  {
756
  "epoch": 25.0,
757
- "eval_loss": 0.024379713460803032,
758
- "eval_max_distance": 45,
759
- "eval_mean_distance": 1,
760
- "eval_runtime": 0.47,
761
- "eval_samples_per_second": 106.387,
762
- "eval_steps_per_second": 4.255,
763
- "step": 82450
764
  },
765
  {
766
  "epoch": 25.22,
767
- "learning_rate": 0.0006441614446465872,
768
- "loss": 0.0048,
769
- "step": 83160
770
  },
771
  {
772
  "epoch": 25.52,
773
- "learning_rate": 0.000638602520045819,
774
  "loss": 0.0045,
775
- "step": 84150
776
  },
777
  {
778
  "epoch": 25.82,
779
- "learning_rate": 0.0006330435954450508,
780
- "loss": 0.0044,
781
- "step": 85140
782
  },
783
  {
784
  "epoch": 26.0,
785
- "eval_loss": 0.014908027835190296,
786
- "eval_max_distance": 23,
787
- "eval_mean_distance": 1,
788
- "eval_runtime": 0.4819,
789
- "eval_samples_per_second": 103.748,
790
- "eval_steps_per_second": 4.15,
791
- "step": 85748
792
  },
793
  {
794
  "epoch": 26.12,
795
- "learning_rate": 0.0006274846708442828,
796
- "loss": 0.0044,
797
- "step": 86130
798
  },
799
  {
800
  "epoch": 26.42,
801
- "learning_rate": 0.0006219257462435146,
802
- "loss": 0.0042,
803
- "step": 87120
804
  },
805
  {
806
  "epoch": 26.72,
807
- "learning_rate": 0.0006163668216427465,
808
- "loss": 0.0043,
809
- "step": 88110
810
  },
811
  {
812
  "epoch": 27.0,
813
- "eval_loss": 0.0256387647241354,
814
- "eval_max_distance": 63,
815
- "eval_mean_distance": 1,
816
- "eval_runtime": 0.5104,
817
- "eval_samples_per_second": 97.954,
818
- "eval_steps_per_second": 3.918,
819
- "step": 89046
820
  },
821
  {
822
  "epoch": 27.02,
823
- "learning_rate": 0.0006108078970419783,
824
  "loss": 0.0043,
825
- "step": 89100
826
  },
827
  {
828
  "epoch": 27.32,
829
- "learning_rate": 0.0006052489724412101,
830
- "loss": 0.004,
831
- "step": 90090
832
  },
833
  {
834
  "epoch": 27.62,
835
- "learning_rate": 0.0005996900478404421,
836
- "loss": 0.0037,
837
- "step": 91080
838
  },
839
  {
840
  "epoch": 27.92,
841
- "learning_rate": 0.0005941311232396739,
842
- "loss": 0.0038,
843
- "step": 92070
844
  },
845
  {
846
  "epoch": 28.0,
847
- "eval_loss": 0.017227506265044212,
848
- "eval_max_distance": 30,
849
- "eval_mean_distance": 1,
850
- "eval_runtime": 0.4632,
851
- "eval_samples_per_second": 107.934,
852
- "eval_steps_per_second": 4.317,
853
- "step": 92344
854
  },
855
  {
856
  "epoch": 28.22,
857
- "learning_rate": 0.0005885721986389058,
858
  "loss": 0.0037,
859
- "step": 93060
860
  },
861
  {
862
  "epoch": 28.52,
863
- "learning_rate": 0.0005830132740381376,
864
- "loss": 0.0038,
865
- "step": 94050
866
  },
867
  {
868
  "epoch": 28.82,
869
- "learning_rate": 0.0005774543494373694,
870
- "loss": 0.0036,
871
- "step": 95040
872
  },
873
  {
874
  "epoch": 29.0,
875
- "eval_loss": 0.022354494780302048,
876
- "eval_max_distance": 37,
877
- "eval_mean_distance": 1,
878
- "eval_runtime": 0.4846,
879
- "eval_samples_per_second": 103.187,
880
- "eval_steps_per_second": 4.127,
881
- "step": 95642
882
  },
883
  {
884
  "epoch": 29.12,
885
- "learning_rate": 0.0005718954248366013,
886
- "loss": 0.0037,
887
- "step": 96030
888
  },
889
  {
890
  "epoch": 29.42,
891
- "learning_rate": 0.0005663365002358332,
892
  "loss": 0.0033,
893
- "step": 97020
894
  },
895
  {
896
  "epoch": 29.72,
897
- "learning_rate": 0.000560777575635065,
898
- "loss": 0.0033,
899
- "step": 98010
900
  },
901
  {
902
  "epoch": 30.0,
903
- "eval_loss": 0.01936698891222477,
904
- "eval_max_distance": 30,
905
- "eval_mean_distance": 1,
906
- "eval_runtime": 0.4829,
907
- "eval_samples_per_second": 103.544,
908
- "eval_steps_per_second": 4.142,
909
- "step": 98940
910
  },
911
  {
912
  "epoch": 30.02,
913
- "learning_rate": 0.0005552186510342969,
914
- "loss": 0.0035,
915
- "step": 99000
916
  },
917
  {
918
  "epoch": 30.32,
919
- "learning_rate": 0.0005496597264335288,
920
  "loss": 0.003,
921
- "step": 99990
922
  },
923
  {
924
  "epoch": 30.62,
925
- "learning_rate": 0.0005441008018327606,
926
- "loss": 0.0033,
927
- "step": 100980
928
  },
929
  {
930
  "epoch": 30.92,
931
- "learning_rate": 0.0005385418772319925,
932
- "loss": 0.0031,
933
- "step": 101970
934
  },
935
  {
936
  "epoch": 31.0,
937
- "eval_loss": 0.023793019354343414,
938
- "eval_max_distance": 59,
939
- "eval_mean_distance": 1,
940
- "eval_runtime": 0.5012,
941
- "eval_samples_per_second": 99.754,
942
- "eval_steps_per_second": 3.99,
943
- "step": 102238
944
  },
945
  {
946
  "epoch": 31.22,
947
- "learning_rate": 0.0005329829526312243,
948
- "loss": 0.0029,
949
- "step": 102960
950
  },
951
  {
952
- "epoch": 31.52,
953
- "learning_rate": 0.0005274240280304562,
954
- "loss": 0.003,
955
- "step": 103950
956
  },
957
  {
958
- "epoch": 31.82,
959
- "learning_rate": 0.000521865103429688,
960
- "loss": 0.003,
961
- "step": 104940
962
  },
963
  {
964
  "epoch": 32.0,
965
- "eval_loss": 0.02003033086657524,
966
- "eval_max_distance": 28,
967
- "eval_mean_distance": 1,
968
- "eval_runtime": 0.475,
969
- "eval_samples_per_second": 105.268,
970
- "eval_steps_per_second": 4.211,
971
- "step": 105536
972
  },
973
  {
974
- "epoch": 32.12,
975
- "learning_rate": 0.00051630617882892,
976
  "loss": 0.0028,
977
- "step": 105930
978
  },
979
  {
980
- "epoch": 32.42,
981
- "learning_rate": 0.0005107472542281517,
982
- "loss": 0.0027,
983
- "step": 106920
984
  },
985
  {
986
- "epoch": 32.72,
987
- "learning_rate": 0.0005051883296273836,
988
- "loss": 0.0028,
989
- "step": 107910
990
  },
991
  {
992
  "epoch": 33.0,
993
- "eval_loss": 0.01606147363781929,
994
- "eval_max_distance": 18,
995
  "eval_mean_distance": 0,
996
- "eval_runtime": 0.4673,
997
- "eval_samples_per_second": 107.008,
998
- "eval_steps_per_second": 4.28,
999
- "step": 108834
1000
  },
1001
  {
1002
- "epoch": 33.02,
1003
- "learning_rate": 0.0004996294050266155,
1004
- "loss": 0.0028,
1005
- "step": 108900
1006
  },
1007
  {
1008
- "epoch": 33.32,
1009
- "learning_rate": 0.0004940704804258473,
1010
- "loss": 0.0026,
1011
- "step": 109890
1012
  },
1013
  {
1014
- "epoch": 33.62,
1015
- "learning_rate": 0.0004885115558250792,
1016
  "loss": 0.0026,
1017
- "step": 110880
1018
  },
1019
  {
1020
- "epoch": 33.92,
1021
- "learning_rate": 0.00048295263122431103,
1022
- "loss": 0.0027,
1023
- "step": 111870
1024
  },
1025
  {
1026
  "epoch": 34.0,
1027
- "eval_loss": 0.021506933495402336,
1028
- "eval_max_distance": 26,
1029
- "eval_mean_distance": 1,
1030
- "eval_runtime": 0.4763,
1031
- "eval_samples_per_second": 104.968,
1032
- "eval_steps_per_second": 4.199,
1033
- "step": 112132
1034
  },
1035
  {
1036
- "epoch": 34.22,
1037
- "learning_rate": 0.00047739370662354294,
1038
- "loss": 0.0024,
1039
- "step": 112860
1040
  },
1041
  {
1042
- "epoch": 34.52,
1043
- "learning_rate": 0.00047183478202277474,
1044
  "loss": 0.0023,
1045
- "step": 113850
1046
  },
1047
  {
1048
- "epoch": 34.82,
1049
- "learning_rate": 0.0004662758574220066,
1050
- "loss": 0.0025,
1051
- "step": 114840
1052
  },
1053
  {
1054
  "epoch": 35.0,
1055
- "eval_loss": 0.019841769710183144,
1056
- "eval_max_distance": 19,
1057
  "eval_mean_distance": 0,
1058
- "eval_runtime": 0.4767,
1059
- "eval_samples_per_second": 104.884,
1060
- "eval_steps_per_second": 4.195,
1061
- "step": 115430
1062
  },
1063
  {
1064
- "epoch": 35.12,
1065
- "learning_rate": 0.00046071693282123845,
1066
- "loss": 0.0023,
1067
- "step": 115830
1068
  },
1069
  {
1070
- "epoch": 35.42,
1071
- "learning_rate": 0.0004551580082204703,
1072
- "loss": 0.0021,
1073
- "step": 116820
1074
  },
1075
  {
1076
- "epoch": 35.72,
1077
- "learning_rate": 0.0004495990836197022,
1078
  "loss": 0.0023,
1079
- "step": 117810
1080
  },
1081
  {
1082
  "epoch": 36.0,
1083
- "eval_loss": 0.01675160974264145,
1084
- "eval_max_distance": 24,
1085
  "eval_mean_distance": 0,
1086
- "eval_runtime": 0.4591,
1087
- "eval_samples_per_second": 108.901,
1088
- "eval_steps_per_second": 4.356,
1089
- "step": 118728
1090
  },
1091
  {
1092
- "epoch": 36.02,
1093
- "learning_rate": 0.000444040159018934,
1094
- "loss": 0.0023,
1095
- "step": 118800
1096
  },
1097
  {
1098
- "epoch": 36.32,
1099
- "learning_rate": 0.0004384812344181659,
1100
- "loss": 0.0021,
1101
- "step": 119790
1102
  },
1103
  {
1104
- "epoch": 36.62,
1105
- "learning_rate": 0.0004329223098173978,
1106
- "loss": 0.0021,
1107
- "step": 120780
1108
  },
1109
  {
1110
- "epoch": 36.92,
1111
- "learning_rate": 0.0004273633852166296,
1112
- "loss": 0.002,
1113
- "step": 121770
1114
  },
1115
  {
1116
  "epoch": 37.0,
1117
- "eval_loss": 0.022139811888337135,
1118
- "eval_max_distance": 32,
1119
- "eval_mean_distance": 1,
1120
- "eval_runtime": 0.4713,
1121
- "eval_samples_per_second": 106.08,
1122
- "eval_steps_per_second": 4.243,
1123
- "step": 122026
1124
  },
1125
  {
1126
- "epoch": 37.22,
1127
- "learning_rate": 0.0004218044606158615,
1128
- "loss": 0.002,
1129
- "step": 122760
1130
  },
1131
  {
1132
- "epoch": 37.52,
1133
- "learning_rate": 0.00041624553601509335,
1134
  "loss": 0.0019,
1135
- "step": 123750
1136
  },
1137
  {
1138
- "epoch": 37.82,
1139
- "learning_rate": 0.00041068661141432515,
1140
- "loss": 0.0019,
1141
- "step": 124740
1142
  },
1143
  {
1144
  "epoch": 38.0,
1145
- "eval_loss": 0.02140805311501026,
1146
- "eval_max_distance": 32,
1147
- "eval_mean_distance": 1,
1148
- "eval_runtime": 0.4808,
1149
- "eval_samples_per_second": 104.001,
1150
- "eval_steps_per_second": 4.16,
1151
- "step": 125324
1152
  },
1153
  {
1154
- "epoch": 38.12,
1155
- "learning_rate": 0.00040512768681355706,
1156
- "loss": 0.0019,
1157
- "step": 125730
1158
  },
1159
  {
1160
- "epoch": 38.42,
1161
- "learning_rate": 0.0003995687622127889,
1162
  "loss": 0.0018,
1163
- "step": 126720
1164
  },
1165
  {
1166
- "epoch": 38.72,
1167
- "learning_rate": 0.0003940098376120208,
1168
  "loss": 0.0017,
1169
- "step": 127710
1170
  },
1171
  {
1172
  "epoch": 39.0,
1173
- "eval_loss": 0.018618840724229813,
1174
- "eval_max_distance": 19,
1175
  "eval_mean_distance": 0,
1176
- "eval_runtime": 0.4752,
1177
- "eval_samples_per_second": 105.222,
1178
- "eval_steps_per_second": 4.209,
1179
- "step": 128622
1180
  },
1181
  {
1182
- "epoch": 39.02,
1183
- "learning_rate": 0.00038845091301125263,
1184
- "loss": 0.002,
1185
- "step": 128700
1186
  },
1187
  {
1188
- "epoch": 39.32,
1189
- "learning_rate": 0.0003828919884104845,
1190
  "loss": 0.0016,
1191
- "step": 129690
1192
  },
1193
  {
1194
- "epoch": 39.62,
1195
- "learning_rate": 0.00037733306380971634,
1196
  "loss": 0.0017,
1197
- "step": 130680
1198
  },
1199
  {
1200
- "epoch": 39.92,
1201
- "learning_rate": 0.0003717741392089482,
1202
  "loss": 0.0017,
1203
- "step": 131670
1204
  },
1205
  {
1206
  "epoch": 40.0,
1207
- "eval_loss": 0.017086679115891457,
1208
- "eval_max_distance": 23,
1209
  "eval_mean_distance": 0,
1210
- "eval_runtime": 0.458,
1211
- "eval_samples_per_second": 109.178,
1212
- "eval_steps_per_second": 4.367,
1213
- "step": 131920
1214
  },
1215
  {
1216
- "epoch": 40.22,
1217
- "learning_rate": 0.00036621521460818,
1218
  "loss": 0.0015,
1219
- "step": 132660
1220
  },
1221
  {
1222
- "epoch": 40.52,
1223
- "learning_rate": 0.0003606562900074119,
1224
- "loss": 0.0016,
1225
- "step": 133650
1226
  },
1227
  {
1228
- "epoch": 40.82,
1229
- "learning_rate": 0.00035509736540664376,
1230
  "loss": 0.0016,
1231
- "step": 134640
1232
  },
1233
  {
1234
  "epoch": 41.0,
1235
- "eval_loss": 0.01638130471110344,
1236
- "eval_max_distance": 17,
1237
  "eval_mean_distance": 0,
1238
- "eval_runtime": 0.4581,
1239
- "eval_samples_per_second": 109.147,
1240
- "eval_steps_per_second": 4.366,
1241
- "step": 135218
1242
  },
1243
  {
1244
- "epoch": 41.12,
1245
- "learning_rate": 0.0003495384408058756,
1246
  "loss": 0.0015,
1247
- "step": 135630
1248
  },
1249
  {
1250
  "epoch": 41.43,
1251
- "learning_rate": 0.0003439795162051075,
1252
  "loss": 0.0014,
1253
- "step": 136620
1254
  },
1255
  {
1256
  "epoch": 41.73,
1257
- "learning_rate": 0.00033842059160433933,
1258
- "loss": 0.0015,
1259
- "step": 137610
1260
  },
1261
  {
1262
  "epoch": 42.0,
1263
- "eval_loss": 0.016585057601332664,
1264
- "eval_max_distance": 21,
1265
- "eval_mean_distance": 1,
1266
- "eval_runtime": 0.479,
1267
- "eval_samples_per_second": 104.393,
1268
- "eval_steps_per_second": 4.176,
1269
- "step": 138516
1270
  },
1271
  {
1272
  "epoch": 42.03,
1273
- "learning_rate": 0.0003328616670035712,
1274
- "loss": 0.0014,
1275
- "step": 138600
1276
  },
1277
  {
1278
  "epoch": 42.33,
1279
- "learning_rate": 0.00032730274240280304,
1280
- "loss": 0.0015,
1281
- "step": 139590
1282
  },
1283
  {
1284
  "epoch": 42.63,
1285
- "learning_rate": 0.00032174381780203495,
1286
- "loss": 0.0015,
1287
- "step": 140580
1288
  },
1289
  {
1290
  "epoch": 42.93,
1291
- "learning_rate": 0.00031618489320126675,
1292
- "loss": 0.0014,
1293
- "step": 141570
1294
  },
1295
  {
1296
  "epoch": 43.0,
1297
- "eval_loss": 0.016704820096492767,
1298
- "eval_max_distance": 21,
1299
  "eval_mean_distance": 0,
1300
- "eval_runtime": 0.4809,
1301
- "eval_samples_per_second": 103.976,
1302
- "eval_steps_per_second": 4.159,
1303
- "step": 141814
1304
  },
1305
  {
1306
  "epoch": 43.23,
1307
- "learning_rate": 0.0003106259686004986,
1308
- "loss": 0.0011,
1309
- "step": 142560
1310
  },
1311
  {
1312
  "epoch": 43.53,
1313
- "learning_rate": 0.0003050670439997305,
1314
- "loss": 0.0013,
1315
- "step": 143550
1316
  },
1317
  {
1318
- "epoch": 43.83,
1319
- "learning_rate": 0.0002995081193989623,
1320
- "loss": 0.0019,
1321
- "step": 144540
1322
  },
1323
  {
1324
  "epoch": 44.0,
1325
- "eval_loss": 0.019240867346525192,
1326
- "eval_max_distance": 32,
1327
- "eval_mean_distance": 1,
1328
- "eval_runtime": 0.6494,
1329
- "eval_samples_per_second": 76.999,
1330
- "eval_steps_per_second": 3.08,
1331
- "step": 145112
1332
  },
1333
  {
1334
- "epoch": 44.13,
1335
- "learning_rate": 0.00029394919479819423,
1336
  "loss": 0.0012,
1337
- "step": 145530
1338
  },
1339
  {
1340
- "epoch": 44.43,
1341
- "learning_rate": 0.00028839027019742603,
1342
  "loss": 0.0011,
1343
- "step": 146520
1344
  },
1345
  {
1346
- "epoch": 44.73,
1347
- "learning_rate": 0.0002828313455966579,
1348
- "loss": 0.0011,
1349
- "step": 147510
1350
  },
1351
  {
1352
  "epoch": 45.0,
1353
- "eval_loss": 0.02091757208108902,
1354
- "eval_max_distance": 27,
1355
- "eval_mean_distance": 1,
1356
- "eval_runtime": 0.4646,
1357
- "eval_samples_per_second": 107.608,
1358
- "eval_steps_per_second": 4.304,
1359
- "step": 148410
1360
- },
1361
- {
1362
- "epoch": 45.03,
1363
- "learning_rate": 0.0002772724209958898,
1364
- "loss": 0.0011,
1365
- "step": 148500
1366
  },
1367
  {
1368
- "epoch": 45.33,
1369
- "learning_rate": 0.0002717134963951216,
1370
- "loss": 0.0011,
1371
- "step": 149490
1372
  },
1373
  {
1374
- "epoch": 45.63,
1375
- "learning_rate": 0.0002661545717943535,
1376
  "loss": 0.001,
1377
- "step": 150480
1378
  },
1379
  {
1380
- "epoch": 45.93,
1381
- "learning_rate": 0.00026059564719358537,
1382
  "loss": 0.0011,
1383
- "step": 151470
 
 
 
 
 
 
1384
  },
1385
  {
1386
  "epoch": 46.0,
1387
- "eval_loss": 0.02175173908472061,
1388
- "eval_max_distance": 23,
1389
  "eval_mean_distance": 0,
1390
- "eval_runtime": 0.4863,
1391
- "eval_samples_per_second": 102.827,
1392
- "eval_steps_per_second": 4.113,
1393
- "step": 151708
1394
  },
1395
  {
1396
- "epoch": 46.23,
1397
- "learning_rate": 0.00025503672259281717,
1398
  "loss": 0.001,
1399
- "step": 152460
1400
  },
1401
  {
1402
- "epoch": 46.53,
1403
- "learning_rate": 0.0002494777979920491,
1404
- "loss": 0.001,
1405
- "step": 153450
1406
  },
1407
  {
1408
- "epoch": 46.83,
1409
- "learning_rate": 0.0002439188733912809,
1410
  "loss": 0.001,
1411
- "step": 154440
1412
  },
1413
  {
1414
  "epoch": 47.0,
1415
- "eval_loss": 0.01951581984758377,
1416
- "eval_max_distance": 25,
1417
  "eval_mean_distance": 0,
1418
- "eval_runtime": 0.4608,
1419
- "eval_samples_per_second": 108.512,
1420
- "eval_steps_per_second": 4.34,
1421
- "step": 155006
1422
  },
1423
  {
1424
- "epoch": 47.13,
1425
- "learning_rate": 0.0002383599487905128,
1426
- "loss": 0.001,
1427
- "step": 155430
1428
  },
1429
  {
1430
- "epoch": 47.43,
1431
- "learning_rate": 0.00023280102418974464,
1432
  "loss": 0.0009,
1433
- "step": 156420
1434
  },
1435
  {
1436
- "epoch": 47.73,
1437
- "learning_rate": 0.00022724209958897647,
1438
  "loss": 0.0009,
1439
- "step": 157410
1440
  },
1441
  {
1442
  "epoch": 48.0,
1443
- "eval_loss": 0.01657327450811863,
1444
- "eval_max_distance": 15,
1445
  "eval_mean_distance": 0,
1446
- "eval_runtime": 0.4688,
1447
- "eval_samples_per_second": 106.651,
1448
- "eval_steps_per_second": 4.266,
1449
- "step": 158304
1450
  },
1451
  {
1452
- "epoch": 48.03,
1453
- "learning_rate": 0.00022168317498820833,
1454
  "loss": 0.0009,
1455
- "step": 158400
1456
  },
1457
  {
1458
- "epoch": 48.33,
1459
- "learning_rate": 0.0002161242503874402,
1460
  "loss": 0.0008,
1461
- "step": 159390
1462
  },
1463
  {
1464
- "epoch": 48.63,
1465
- "learning_rate": 0.00021056532578667207,
1466
- "loss": 0.0008,
1467
- "step": 160380
1468
  },
1469
  {
1470
- "epoch": 48.93,
1471
- "learning_rate": 0.00020500640118590392,
1472
- "loss": 0.0008,
1473
- "step": 161370
1474
  },
1475
  {
1476
  "epoch": 49.0,
1477
- "eval_loss": 0.020961837843060493,
1478
- "eval_max_distance": 31,
1479
- "eval_mean_distance": 1,
1480
- "eval_runtime": 0.4893,
1481
- "eval_samples_per_second": 102.188,
1482
- "eval_steps_per_second": 4.088,
1483
- "step": 161602
1484
  },
1485
  {
1486
- "epoch": 49.23,
1487
- "learning_rate": 0.00019944747658513578,
1488
  "loss": 0.0008,
1489
- "step": 162360
1490
  },
1491
  {
1492
- "epoch": 49.53,
1493
- "learning_rate": 0.00019388855198436764,
1494
  "loss": 0.0008,
1495
- "step": 163350
1496
  },
1497
  {
1498
- "epoch": 49.83,
1499
- "learning_rate": 0.0001883296273835995,
1500
  "loss": 0.0008,
1501
- "step": 164340
1502
  },
1503
  {
1504
  "epoch": 50.0,
1505
- "eval_loss": 0.022983456030488014,
1506
- "eval_max_distance": 22,
1507
  "eval_mean_distance": 0,
1508
- "eval_runtime": 0.479,
1509
- "eval_samples_per_second": 104.39,
1510
- "eval_steps_per_second": 4.176,
1511
- "step": 164900
1512
  },
1513
  {
1514
- "epoch": 50.13,
1515
- "learning_rate": 0.00018277070278283135,
1516
- "loss": 0.0008,
1517
- "step": 165330
1518
  },
1519
  {
1520
- "epoch": 50.43,
1521
- "learning_rate": 0.0001772117781820632,
1522
  "loss": 0.0007,
1523
- "step": 166320
1524
  },
1525
  {
1526
- "epoch": 50.73,
1527
- "learning_rate": 0.00017165285358129506,
1528
- "loss": 0.0008,
1529
- "step": 167310
1530
  },
1531
  {
1532
  "epoch": 51.0,
1533
- "eval_loss": 0.018444916233420372,
1534
- "eval_max_distance": 15,
1535
  "eval_mean_distance": 0,
1536
- "eval_runtime": 0.4866,
1537
- "eval_samples_per_second": 102.75,
1538
- "eval_steps_per_second": 4.11,
1539
- "step": 168198
1540
  },
1541
  {
1542
- "epoch": 51.03,
1543
- "learning_rate": 0.00016609392898052691,
1544
  "loss": 0.0007,
1545
- "step": 168300
1546
  },
1547
  {
1548
- "epoch": 51.33,
1549
- "learning_rate": 0.0001605350043797588,
1550
  "loss": 0.0007,
1551
- "step": 169290
1552
  },
1553
  {
1554
- "epoch": 51.63,
1555
- "learning_rate": 0.00015497607977899065,
1556
- "loss": 0.0007,
1557
- "step": 170280
1558
  },
1559
  {
1560
- "epoch": 51.93,
1561
- "learning_rate": 0.00014941715517822248,
1562
- "loss": 0.0007,
1563
- "step": 171270
1564
  },
1565
  {
1566
  "epoch": 52.0,
1567
- "eval_loss": 0.01832015998661518,
1568
- "eval_max_distance": 15,
1569
  "eval_mean_distance": 0,
1570
- "eval_runtime": 0.4672,
1571
- "eval_samples_per_second": 107.025,
1572
- "eval_steps_per_second": 4.281,
1573
- "step": 171496
1574
  },
1575
  {
1576
- "epoch": 52.23,
1577
- "learning_rate": 0.00014385823057745434,
1578
  "loss": 0.0006,
1579
- "step": 172260
1580
  },
1581
  {
1582
- "epoch": 52.53,
1583
- "learning_rate": 0.00013829930597668622,
1584
  "loss": 0.0006,
1585
- "step": 173250
1586
  },
1587
  {
1588
- "epoch": 52.83,
1589
- "learning_rate": 0.00013274038137591808,
1590
  "loss": 0.0006,
1591
- "step": 174240
1592
  },
1593
  {
1594
  "epoch": 53.0,
1595
- "eval_loss": 0.023398304358124733,
1596
- "eval_max_distance": 32,
1597
- "eval_mean_distance": 1,
1598
- "eval_runtime": 0.4822,
1599
- "eval_samples_per_second": 103.698,
1600
- "eval_steps_per_second": 4.148,
1601
- "step": 174794
1602
  },
1603
  {
1604
- "epoch": 53.13,
1605
- "learning_rate": 0.0001271814567751499,
1606
  "loss": 0.0006,
1607
- "step": 175230
1608
  },
1609
  {
1610
- "epoch": 53.43,
1611
- "learning_rate": 0.00012162253217438179,
1612
- "loss": 0.0006,
1613
- "step": 176220
1614
  },
1615
  {
1616
- "epoch": 53.73,
1617
- "learning_rate": 0.00011606360757361364,
1618
  "loss": 0.0005,
1619
- "step": 177210
1620
  },
1621
  {
1622
  "epoch": 54.0,
1623
- "eval_loss": 0.022733934223651886,
1624
- "eval_max_distance": 24,
1625
  "eval_mean_distance": 0,
1626
- "eval_runtime": 0.4789,
1627
- "eval_samples_per_second": 104.41,
1628
- "eval_steps_per_second": 4.176,
1629
- "step": 178092
1630
  },
1631
  {
1632
- "epoch": 54.03,
1633
- "learning_rate": 0.0001105046829728455,
1634
  "loss": 0.0005,
1635
- "step": 178200
1636
  },
1637
  {
1638
- "epoch": 54.33,
1639
- "learning_rate": 0.00010494575837207735,
1640
- "loss": 0.0005,
1641
- "step": 179190
1642
  },
1643
  {
1644
- "epoch": 54.63,
1645
- "learning_rate": 9.938683377130921e-05,
1646
  "loss": 0.0005,
1647
- "step": 180180
1648
  },
1649
  {
1650
- "epoch": 54.93,
1651
- "learning_rate": 9.382790917054107e-05,
1652
- "loss": 0.0004,
1653
- "step": 181170
1654
  },
1655
  {
1656
  "epoch": 55.0,
1657
- "eval_loss": 0.018815917894244194,
1658
- "eval_max_distance": 15,
1659
  "eval_mean_distance": 0,
1660
- "eval_runtime": 0.4798,
1661
- "eval_samples_per_second": 104.21,
1662
- "eval_steps_per_second": 4.168,
1663
- "step": 181390
1664
  },
1665
  {
1666
- "epoch": 55.23,
1667
- "learning_rate": 8.826898456977294e-05,
1668
  "loss": 0.0005,
1669
- "step": 182160
1670
  },
1671
  {
1672
- "epoch": 55.53,
1673
- "learning_rate": 8.271005996900478e-05,
1674
  "loss": 0.0004,
1675
- "step": 183150
1676
  },
1677
  {
1678
- "epoch": 55.83,
1679
- "learning_rate": 7.715113536823665e-05,
1680
  "loss": 0.0005,
1681
- "step": 184140
1682
  },
1683
  {
1684
  "epoch": 56.0,
1685
- "eval_loss": 0.01906018890440464,
1686
- "eval_max_distance": 15,
1687
  "eval_mean_distance": 0,
1688
- "eval_runtime": 0.48,
1689
- "eval_samples_per_second": 104.168,
1690
- "eval_steps_per_second": 4.167,
1691
- "step": 184688
1692
  },
1693
  {
1694
- "epoch": 56.13,
1695
- "learning_rate": 7.15922107674685e-05,
1696
  "loss": 0.0004,
1697
- "step": 185130
1698
  },
1699
  {
1700
- "epoch": 56.43,
1701
- "learning_rate": 6.603328616670036e-05,
1702
  "loss": 0.0004,
1703
- "step": 186120
1704
  },
1705
  {
1706
- "epoch": 56.73,
1707
- "learning_rate": 6.0474361565932214e-05,
1708
  "loss": 0.0004,
1709
- "step": 187110
1710
  },
1711
  {
1712
  "epoch": 57.0,
1713
- "eval_loss": 0.018282707780599594,
1714
- "eval_max_distance": 15,
1715
  "eval_mean_distance": 0,
1716
- "eval_runtime": 0.4797,
1717
- "eval_samples_per_second": 104.233,
1718
- "eval_steps_per_second": 4.169,
1719
- "step": 187986
1720
  },
1721
  {
1722
- "epoch": 57.03,
1723
- "learning_rate": 5.491543696516407e-05,
1724
  "loss": 0.0004,
1725
- "step": 188100
1726
  },
1727
  {
1728
- "epoch": 57.33,
1729
- "learning_rate": 4.935651236439593e-05,
1730
  "loss": 0.0004,
1731
- "step": 189090
1732
  },
1733
  {
1734
- "epoch": 57.63,
1735
- "learning_rate": 4.379758776362779e-05,
1736
  "loss": 0.0004,
1737
- "step": 190080
1738
  },
1739
  {
1740
- "epoch": 57.94,
1741
- "learning_rate": 3.823866316285965e-05,
1742
  "loss": 0.0003,
1743
- "step": 191070
1744
  },
1745
  {
1746
  "epoch": 58.0,
1747
- "eval_loss": 0.018019111827015877,
1748
- "eval_max_distance": 15,
1749
  "eval_mean_distance": 0,
1750
- "eval_runtime": 0.4619,
1751
- "eval_samples_per_second": 108.242,
1752
- "eval_steps_per_second": 4.33,
1753
- "step": 191284
1754
  },
1755
  {
1756
- "epoch": 58.24,
1757
- "learning_rate": 3.2679738562091506e-05,
1758
- "loss": 0.0004,
1759
- "step": 192060
1760
  },
1761
  {
1762
- "epoch": 58.54,
1763
- "learning_rate": 2.7120813961323362e-05,
1764
  "loss": 0.0004,
1765
- "step": 193050
1766
  },
1767
  {
1768
- "epoch": 58.84,
1769
- "learning_rate": 2.1561889360555218e-05,
1770
  "loss": 0.0003,
1771
- "step": 194040
1772
  },
1773
  {
1774
  "epoch": 59.0,
1775
- "eval_loss": 0.01795811764895916,
1776
- "eval_max_distance": 15,
1777
  "eval_mean_distance": 0,
1778
- "eval_runtime": 0.475,
1779
- "eval_samples_per_second": 105.265,
1780
- "eval_steps_per_second": 4.211,
1781
- "step": 194582
1782
  },
1783
  {
1784
- "epoch": 59.14,
1785
- "learning_rate": 1.6002964759787074e-05,
1786
- "loss": 0.0004,
1787
- "step": 195030
1788
  },
1789
  {
1790
- "epoch": 59.44,
1791
- "learning_rate": 1.0444040159018933e-05,
1792
- "loss": 0.0004,
1793
- "step": 196020
1794
  },
1795
  {
1796
- "epoch": 59.74,
1797
- "learning_rate": 4.885115558250792e-06,
1798
  "loss": 0.0004,
1799
- "step": 197010
1800
  },
1801
  {
1802
  "epoch": 60.0,
1803
- "eval_loss": 0.017678335309028625,
1804
- "eval_max_distance": 15,
1805
  "eval_mean_distance": 0,
1806
- "eval_runtime": 0.4798,
1807
- "eval_samples_per_second": 104.214,
1808
- "eval_steps_per_second": 4.169,
1809
- "step": 197880
1810
  },
1811
  {
1812
  "epoch": 60.0,
1813
- "step": 197880,
1814
- "total_flos": 1.1400109636858675e+17,
1815
- "train_loss": 0.031872519274052644,
1816
- "train_runtime": 16366.2485,
1817
- "train_samples_per_second": 362.656,
1818
- "train_steps_per_second": 12.091
1819
  }
1820
  ],
1821
- "logging_steps": 990,
1822
- "max_steps": 197880,
1823
  "num_train_epochs": 60,
1824
- "save_steps": 1979,
1825
- "total_flos": 1.1400109636858675e+17,
1826
  "trial_name": null,
1827
  "trial_params": null
1828
  }
 
3
  "best_model_checkpoint": null,
4
  "epoch": 60.0,
5
  "eval_steps": 500,
6
+ "global_step": 200040,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
  "epoch": 0.0,
13
+ "learning_rate": 4.999000199960008e-08,
14
+ "loss": 13.1619,
15
  "step": 1
16
  },
17
  {
18
  "epoch": 0.3,
19
+ "learning_rate": 5.003999200159968e-05,
20
+ "loss": 3.3531,
21
+ "step": 1001
22
  },
23
  {
24
  "epoch": 0.6,
25
+ "learning_rate": 0.00010007998400319936,
26
+ "loss": 0.3338,
27
+ "step": 2002
28
  },
29
  {
30
  "epoch": 0.9,
31
+ "learning_rate": 0.00015011997600479905,
32
+ "loss": 0.2251,
33
+ "step": 3003
34
  },
35
  {
36
  "epoch": 1.0,
37
+ "eval_loss": 0.118980273604393,
38
+ "eval_max_distance": 29,
39
+ "eval_mean_distance": 3,
40
+ "eval_runtime": 0.3245,
41
+ "eval_samples_per_second": 154.076,
42
+ "eval_steps_per_second": 6.163,
43
+ "step": 3334
44
  },
45
  {
46
  "epoch": 1.2,
47
+ "learning_rate": 0.00020015996800639872,
48
+ "loss": 0.1668,
49
+ "step": 4004
50
  },
51
  {
52
  "epoch": 1.5,
53
+ "learning_rate": 0.0002501999600079984,
54
+ "loss": 0.1375,
55
+ "step": 5005
56
  },
57
  {
58
  "epoch": 1.8,
59
+ "learning_rate": 0.0003002399520095981,
60
  "loss": 0.1179,
61
+ "step": 6006
62
  },
63
  {
64
  "epoch": 2.0,
65
+ "eval_loss": 0.057394467294216156,
66
+ "eval_max_distance": 31,
67
+ "eval_mean_distance": 2,
68
+ "eval_runtime": 0.2749,
69
+ "eval_samples_per_second": 181.873,
70
+ "eval_steps_per_second": 7.275,
71
+ "step": 6668
72
  },
73
  {
74
  "epoch": 2.1,
75
+ "learning_rate": 0.00035027994401119777,
76
+ "loss": 0.0992,
77
+ "step": 7007
78
  },
79
  {
80
  "epoch": 2.4,
81
+ "learning_rate": 0.00040031993601279744,
82
+ "loss": 0.0886,
83
+ "step": 8008
84
  },
85
  {
86
  "epoch": 2.7,
87
+ "learning_rate": 0.0004503599280143971,
88
+ "loss": 0.0848,
89
+ "step": 9009
90
  },
91
  {
92
  "epoch": 3.0,
93
+ "eval_loss": 0.043563079088926315,
94
+ "eval_max_distance": 15,
95
  "eval_mean_distance": 1,
96
+ "eval_runtime": 0.2795,
97
+ "eval_samples_per_second": 178.881,
98
+ "eval_steps_per_second": 7.155,
99
+ "step": 10002
100
  },
101
  {
102
  "epoch": 3.0,
103
+ "learning_rate": 0.0005003999200159968,
104
+ "loss": 0.0757,
105
+ "step": 10010
106
  },
107
  {
108
  "epoch": 3.3,
109
+ "learning_rate": 0.0005504399120175964,
110
+ "loss": 0.0681,
111
+ "step": 11011
112
  },
113
  {
114
  "epoch": 3.6,
115
+ "learning_rate": 0.0006004799040191962,
116
  "loss": 0.0653,
117
+ "step": 12012
118
  },
119
  {
120
  "epoch": 3.9,
121
+ "learning_rate": 0.0006505198960207959,
122
+ "loss": 0.0618,
123
+ "step": 13013
124
  },
125
  {
126
  "epoch": 4.0,
127
+ "eval_loss": 0.035945579409599304,
128
+ "eval_max_distance": 20,
129
  "eval_mean_distance": 1,
130
+ "eval_runtime": 0.2802,
131
+ "eval_samples_per_second": 178.422,
132
+ "eval_steps_per_second": 7.137,
133
+ "step": 13336
134
  },
135
  {
136
  "epoch": 4.2,
137
+ "learning_rate": 0.0007005598880223955,
138
+ "loss": 0.0564,
139
+ "step": 14014
140
  },
141
  {
142
  "epoch": 4.5,
143
+ "learning_rate": 0.0007505998800239953,
144
+ "loss": 0.0537,
145
+ "step": 15015
146
  },
147
  {
148
  "epoch": 4.8,
149
+ "learning_rate": 0.0008006398720255949,
150
+ "loss": 0.0532,
151
+ "step": 16016
152
  },
153
  {
154
  "epoch": 5.0,
155
+ "eval_loss": 0.031485434621572495,
156
+ "eval_max_distance": 11,
157
+ "eval_mean_distance": 0,
158
+ "eval_runtime": 0.2717,
159
+ "eval_samples_per_second": 184.007,
160
+ "eval_steps_per_second": 7.36,
161
+ "step": 16670
162
  },
163
  {
164
  "epoch": 5.1,
165
+ "learning_rate": 0.0008506798640271945,
166
+ "loss": 0.05,
167
+ "step": 17017
168
  },
169
  {
170
  "epoch": 5.4,
171
+ "learning_rate": 0.0009007198560287942,
172
+ "loss": 0.0468,
173
+ "step": 18018
174
  },
175
  {
176
  "epoch": 5.7,
177
+ "learning_rate": 0.000950759848030394,
178
+ "loss": 0.0446,
179
+ "step": 19019
180
  },
181
  {
182
  "epoch": 6.0,
183
+ "eval_loss": 0.0298615675419569,
184
+ "eval_max_distance": 16,
185
+ "eval_mean_distance": 0,
186
+ "eval_runtime": 0.2573,
187
+ "eval_samples_per_second": 194.348,
188
+ "eval_steps_per_second": 7.774,
189
+ "step": 20004
190
  },
191
  {
192
  "epoch": 6.0,
193
+ "learning_rate": 0.000999911128885334,
194
+ "loss": 0.0465,
195
+ "step": 20020
196
  },
197
  {
198
+ "epoch": 6.31,
199
+ "learning_rate": 0.0009943511297740451,
200
+ "loss": 0.0384,
201
+ "step": 21021
202
  },
203
  {
204
+ "epoch": 6.61,
205
+ "learning_rate": 0.0009887911306627564,
206
+ "loss": 0.0378,
207
+ "step": 22022
208
  },
209
  {
210
+ "epoch": 6.91,
211
+ "learning_rate": 0.0009832311315514674,
212
+ "loss": 0.0388,
213
+ "step": 23023
214
  },
215
  {
216
  "epoch": 7.0,
217
+ "eval_loss": 0.029532546177506447,
218
+ "eval_max_distance": 15,
219
+ "eval_mean_distance": 0,
220
+ "eval_runtime": 0.2674,
221
+ "eval_samples_per_second": 187.01,
222
+ "eval_steps_per_second": 7.48,
223
+ "step": 23338
224
  },
225
  {
226
+ "epoch": 7.21,
227
+ "learning_rate": 0.0009776711324401787,
228
+ "loss": 0.0336,
229
+ "step": 24024
230
  },
231
  {
232
+ "epoch": 7.51,
233
+ "learning_rate": 0.0009721111333288898,
234
+ "loss": 0.032,
235
+ "step": 25025
236
  },
237
  {
238
+ "epoch": 7.81,
239
+ "learning_rate": 0.000966551134217601,
240
+ "loss": 0.0311,
241
+ "step": 26026
242
  },
243
  {
244
  "epoch": 8.0,
245
+ "eval_loss": 0.02873826026916504,
246
+ "eval_max_distance": 15,
247
+ "eval_mean_distance": 0,
248
+ "eval_runtime": 0.2674,
249
+ "eval_samples_per_second": 186.98,
250
+ "eval_steps_per_second": 7.479,
251
+ "step": 26672
252
  },
253
  {
254
+ "epoch": 8.11,
255
+ "learning_rate": 0.0009609911351063121,
256
+ "loss": 0.0304,
257
+ "step": 27027
258
  },
259
  {
260
  "epoch": 8.41,
261
+ "learning_rate": 0.0009554311359950233,
262
+ "loss": 0.0267,
263
+ "step": 28028
264
  },
265
  {
266
  "epoch": 8.71,
267
+ "learning_rate": 0.0009498711368837344,
268
+ "loss": 0.0269,
269
+ "step": 29029
270
  },
271
  {
272
  "epoch": 9.0,
273
+ "eval_loss": 0.02408006228506565,
274
+ "eval_max_distance": 15,
275
+ "eval_mean_distance": 0,
276
+ "eval_runtime": 0.2548,
277
+ "eval_samples_per_second": 196.242,
278
+ "eval_steps_per_second": 7.85,
279
+ "step": 30006
280
  },
281
  {
282
  "epoch": 9.01,
283
+ "learning_rate": 0.0009443111377724454,
284
+ "loss": 0.0269,
285
+ "step": 30030
286
  },
287
  {
288
  "epoch": 9.31,
289
+ "learning_rate": 0.0009387511386611567,
290
+ "loss": 0.022,
291
+ "step": 31031
292
  },
293
  {
294
  "epoch": 9.61,
295
+ "learning_rate": 0.0009331911395498677,
296
+ "loss": 0.0231,
297
+ "step": 32032
298
  },
299
  {
300
  "epoch": 9.91,
301
+ "learning_rate": 0.000927631140438579,
302
+ "loss": 0.0232,
303
+ "step": 33033
304
  },
305
  {
306
  "epoch": 10.0,
307
+ "eval_loss": 0.022765493020415306,
308
+ "eval_max_distance": 13,
309
+ "eval_mean_distance": 0,
310
+ "eval_runtime": 0.2488,
311
+ "eval_samples_per_second": 200.959,
312
+ "eval_steps_per_second": 8.038,
313
+ "step": 33340
314
  },
315
  {
316
  "epoch": 10.21,
317
+ "learning_rate": 0.00092207114132729,
318
+ "loss": 0.0199,
319
+ "step": 34034
320
  },
321
  {
322
  "epoch": 10.51,
323
+ "learning_rate": 0.0009165111422160013,
324
+ "loss": 0.0196,
325
+ "step": 35035
326
  },
327
  {
328
  "epoch": 10.81,
329
+ "learning_rate": 0.0009109511431047123,
330
  "loss": 0.0203,
331
+ "step": 36036
332
  },
333
  {
334
  "epoch": 11.0,
335
+ "eval_loss": 0.024308495223522186,
336
+ "eval_max_distance": 16,
337
+ "eval_mean_distance": 0,
338
+ "eval_runtime": 0.2617,
339
+ "eval_samples_per_second": 191.039,
340
+ "eval_steps_per_second": 7.642,
341
+ "step": 36674
342
  },
343
  {
344
  "epoch": 11.11,
345
+ "learning_rate": 0.0009053911439934236,
346
+ "loss": 0.0186,
347
+ "step": 37037
348
  },
349
  {
350
  "epoch": 11.41,
351
+ "learning_rate": 0.0008998311448821347,
352
+ "loss": 0.0167,
353
+ "step": 38038
354
  },
355
  {
356
  "epoch": 11.71,
357
+ "learning_rate": 0.0008942711457708459,
358
+ "loss": 0.0173,
359
+ "step": 39039
360
  },
361
  {
362
  "epoch": 12.0,
363
+ "eval_loss": 0.0250206608325243,
364
+ "eval_max_distance": 15,
365
+ "eval_mean_distance": 0,
366
+ "eval_runtime": 0.2565,
367
+ "eval_samples_per_second": 194.951,
368
+ "eval_steps_per_second": 7.798,
369
+ "step": 40008
370
  },
371
  {
372
  "epoch": 12.01,
373
+ "learning_rate": 0.000888711146659557,
374
+ "loss": 0.0178,
375
+ "step": 40040
376
  },
377
  {
378
  "epoch": 12.31,
379
+ "learning_rate": 0.0008831511475482682,
380
+ "loss": 0.0146,
381
+ "step": 41041
382
  },
383
  {
384
  "epoch": 12.61,
385
+ "learning_rate": 0.0008775911484369793,
386
+ "loss": 0.0149,
387
+ "step": 42042
388
  },
389
  {
390
  "epoch": 12.91,
391
+ "learning_rate": 0.0008720311493256904,
392
+ "loss": 0.0151,
393
+ "step": 43043
394
  },
395
  {
396
  "epoch": 13.0,
397
+ "eval_loss": 0.024401402100920677,
398
+ "eval_max_distance": 9,
399
+ "eval_mean_distance": 0,
400
+ "eval_runtime": 0.2582,
401
+ "eval_samples_per_second": 193.662,
402
+ "eval_steps_per_second": 7.746,
403
+ "step": 43342
404
  },
405
  {
406
  "epoch": 13.21,
407
+ "learning_rate": 0.0008664711502144016,
408
+ "loss": 0.0138,
409
+ "step": 44044
410
  },
411
  {
412
  "epoch": 13.51,
413
+ "learning_rate": 0.0008609111511031127,
414
+ "loss": 0.0137,
415
+ "step": 45045
416
  },
417
  {
418
  "epoch": 13.81,
419
+ "learning_rate": 0.0008553511519918239,
420
+ "loss": 0.0136,
421
+ "step": 46046
422
  },
423
  {
424
  "epoch": 14.0,
425
+ "eval_loss": 0.023412013426423073,
426
+ "eval_max_distance": 15,
427
+ "eval_mean_distance": 0,
428
+ "eval_runtime": 0.2465,
429
+ "eval_samples_per_second": 202.834,
430
+ "eval_steps_per_second": 8.113,
431
+ "step": 46676
432
  },
433
  {
434
  "epoch": 14.11,
435
+ "learning_rate": 0.000849791152880535,
436
+ "loss": 0.0126,
437
+ "step": 47047
438
  },
439
  {
440
  "epoch": 14.41,
441
+ "learning_rate": 0.0008442311537692462,
442
  "loss": 0.0121,
443
+ "step": 48048
444
  },
445
  {
446
  "epoch": 14.71,
447
+ "learning_rate": 0.0008386711546579573,
448
+ "loss": 0.0123,
449
+ "step": 49049
450
  },
451
  {
452
  "epoch": 15.0,
453
+ "eval_loss": 0.022092605009675026,
454
+ "eval_max_distance": 9,
455
+ "eval_mean_distance": 0,
456
+ "eval_runtime": 0.2607,
457
+ "eval_samples_per_second": 191.77,
458
+ "eval_steps_per_second": 7.671,
459
+ "step": 50010
460
  },
461
  {
462
  "epoch": 15.01,
463
+ "learning_rate": 0.0008331111555466685,
464
+ "loss": 0.0125,
465
+ "step": 50050
466
  },
467
  {
468
  "epoch": 15.31,
469
+ "learning_rate": 0.0008275511564353796,
470
+ "loss": 0.0101,
471
+ "step": 51051
472
  },
473
  {
474
  "epoch": 15.61,
475
+ "learning_rate": 0.0008219911573240908,
476
+ "loss": 0.0108,
477
+ "step": 52052
478
  },
479
  {
480
  "epoch": 15.91,
481
+ "learning_rate": 0.0008164311582128019,
482
+ "loss": 0.0113,
483
+ "step": 53053
484
  },
485
  {
486
  "epoch": 16.0,
487
+ "eval_loss": 0.024386152625083923,
488
+ "eval_max_distance": 12,
489
+ "eval_mean_distance": 0,
490
+ "eval_runtime": 0.2455,
491
+ "eval_samples_per_second": 203.682,
492
+ "eval_steps_per_second": 8.147,
493
+ "step": 53344
494
  },
495
  {
496
  "epoch": 16.21,
497
+ "learning_rate": 0.0008108711591015131,
498
+ "loss": 0.0099,
499
+ "step": 54054
500
  },
501
  {
502
  "epoch": 16.51,
503
+ "learning_rate": 0.0008053111599902242,
504
+ "loss": 0.0096,
505
+ "step": 55055
506
  },
507
  {
508
  "epoch": 16.81,
509
+ "learning_rate": 0.0007997511608789353,
510
+ "loss": 0.01,
511
+ "step": 56056
512
  },
513
  {
514
  "epoch": 17.0,
515
+ "eval_loss": 0.02255043014883995,
516
+ "eval_max_distance": 13,
517
+ "eval_mean_distance": 0,
518
+ "eval_runtime": 0.2506,
519
+ "eval_samples_per_second": 199.486,
520
+ "eval_steps_per_second": 7.979,
521
+ "step": 56678
522
  },
523
  {
524
  "epoch": 17.11,
525
+ "learning_rate": 0.0007941911617676465,
526
+ "loss": 0.0093,
527
+ "step": 57057
528
  },
529
  {
530
  "epoch": 17.41,
531
+ "learning_rate": 0.0007886311626563576,
532
+ "loss": 0.0087,
533
+ "step": 58058
534
  },
535
  {
536
  "epoch": 17.71,
537
+ "learning_rate": 0.0007830711635450687,
538
+ "loss": 0.0089,
539
+ "step": 59059
540
  },
541
  {
542
  "epoch": 18.0,
543
+ "eval_loss": 0.027119183912873268,
544
+ "eval_max_distance": 13,
545
+ "eval_mean_distance": 0,
546
+ "eval_runtime": 0.2424,
547
+ "eval_samples_per_second": 206.232,
548
+ "eval_steps_per_second": 8.249,
549
+ "step": 60012
550
  },
551
  {
552
  "epoch": 18.01,
553
+ "learning_rate": 0.0007775111644337799,
554
+ "loss": 0.0091,
555
+ "step": 60060
556
  },
557
  {
558
  "epoch": 18.31,
559
+ "learning_rate": 0.0007719511653224912,
560
  "loss": 0.0075,
561
+ "step": 61061
562
  },
563
  {
564
  "epoch": 18.61,
565
+ "learning_rate": 0.0007663911662112022,
566
+ "loss": 0.0079,
567
+ "step": 62062
568
  },
569
  {
570
+ "epoch": 18.92,
571
+ "learning_rate": 0.0007608311670999134,
572
+ "loss": 0.0085,
573
+ "step": 63063
574
  },
575
  {
576
  "epoch": 19.0,
577
+ "eval_loss": 0.024822326377034187,
578
+ "eval_max_distance": 13,
579
+ "eval_mean_distance": 0,
580
+ "eval_runtime": 0.2416,
581
+ "eval_samples_per_second": 206.915,
582
+ "eval_steps_per_second": 8.277,
583
+ "step": 63346
584
  },
585
  {
586
+ "epoch": 19.22,
587
+ "learning_rate": 0.0007552711679886245,
588
+ "loss": 0.0071,
589
+ "step": 64064
590
  },
591
  {
592
+ "epoch": 19.52,
593
+ "learning_rate": 0.0007497111688773357,
594
+ "loss": 0.0074,
595
+ "step": 65065
596
  },
597
  {
598
+ "epoch": 19.82,
599
+ "learning_rate": 0.0007441511697660468,
600
+ "loss": 0.0074,
601
+ "step": 66066
602
  },
603
  {
604
  "epoch": 20.0,
605
+ "eval_loss": 0.027729548513889313,
606
+ "eval_max_distance": 12,
607
+ "eval_mean_distance": 0,
608
+ "eval_runtime": 0.2481,
609
+ "eval_samples_per_second": 201.568,
610
+ "eval_steps_per_second": 8.063,
611
+ "step": 66680
612
  },
613
  {
614
+ "epoch": 20.12,
615
+ "learning_rate": 0.000738591170654758,
616
+ "loss": 0.007,
617
+ "step": 67067
618
  },
619
  {
620
+ "epoch": 20.42,
621
+ "learning_rate": 0.0007330311715434691,
622
+ "loss": 0.0061,
623
+ "step": 68068
624
  },
625
  {
626
+ "epoch": 20.72,
627
+ "learning_rate": 0.0007274711724321802,
628
+ "loss": 0.007,
629
+ "step": 69069
630
  },
631
  {
632
  "epoch": 21.0,
633
+ "eval_loss": 0.030854225158691406,
634
+ "eval_max_distance": 13,
635
+ "eval_mean_distance": 0,
636
+ "eval_runtime": 0.2457,
637
+ "eval_samples_per_second": 203.54,
638
+ "eval_steps_per_second": 8.142,
639
+ "step": 70014
640
  },
641
  {
642
+ "epoch": 21.02,
643
+ "learning_rate": 0.0007219111733208914,
644
+ "loss": 0.0069,
645
+ "step": 70070
646
  },
647
  {
648
+ "epoch": 21.32,
649
+ "learning_rate": 0.0007163511742096025,
650
+ "loss": 0.006,
651
+ "step": 71071
652
  },
653
  {
654
+ "epoch": 21.62,
655
+ "learning_rate": 0.0007107911750983137,
656
+ "loss": 0.0061,
657
+ "step": 72072
658
  },
659
  {
660
+ "epoch": 21.92,
661
+ "learning_rate": 0.0007052311759870248,
662
+ "loss": 0.0066,
663
+ "step": 73073
664
  },
665
  {
666
  "epoch": 22.0,
667
+ "eval_loss": 0.030563361942768097,
668
+ "eval_max_distance": 11,
669
+ "eval_mean_distance": 0,
670
+ "eval_runtime": 0.2419,
671
+ "eval_samples_per_second": 206.734,
672
+ "eval_steps_per_second": 8.269,
673
+ "step": 73348
674
  },
675
  {
676
+ "epoch": 22.22,
677
+ "learning_rate": 0.000699671176875736,
678
+ "loss": 0.0054,
679
+ "step": 74074
680
  },
681
  {
682
+ "epoch": 22.52,
683
+ "learning_rate": 0.0006941111777644471,
684
+ "loss": 0.0061,
685
+ "step": 75075
686
  },
687
  {
688
+ "epoch": 22.82,
689
+ "learning_rate": 0.0006885511786531583,
690
+ "loss": 0.0056,
691
+ "step": 76076
692
  },
693
  {
694
  "epoch": 23.0,
695
+ "eval_loss": 0.028730520978569984,
696
+ "eval_max_distance": 10,
697
  "eval_mean_distance": 0,
698
+ "eval_runtime": 0.2431,
699
+ "eval_samples_per_second": 205.684,
700
+ "eval_steps_per_second": 8.227,
701
+ "step": 76682
702
  },
703
  {
704
+ "epoch": 23.12,
705
+ "learning_rate": 0.0006829911795418694,
706
+ "loss": 0.0054,
707
+ "step": 77077
708
  },
709
  {
710
+ "epoch": 23.42,
711
+ "learning_rate": 0.0006774311804305806,
712
+ "loss": 0.0052,
713
+ "step": 78078
714
  },
715
  {
716
+ "epoch": 23.72,
717
+ "learning_rate": 0.0006718711813192917,
718
+ "loss": 0.0053,
719
+ "step": 79079
720
  },
721
  {
722
  "epoch": 24.0,
723
+ "eval_loss": 0.031197942793369293,
724
+ "eval_max_distance": 12,
725
+ "eval_mean_distance": 0,
726
+ "eval_runtime": 0.2517,
727
+ "eval_samples_per_second": 198.643,
728
+ "eval_steps_per_second": 7.946,
729
+ "step": 80016
730
  },
731
  {
732
+ "epoch": 24.02,
733
+ "learning_rate": 0.0006663111822080029,
734
+ "loss": 0.0054,
735
+ "step": 80080
736
  },
737
  {
738
+ "epoch": 24.32,
739
+ "learning_rate": 0.000660751183096714,
740
+ "loss": 0.0044,
741
+ "step": 81081
742
  },
743
  {
744
+ "epoch": 24.62,
745
+ "learning_rate": 0.000655191183985425,
746
+ "loss": 0.0048,
747
+ "step": 82082
748
  },
749
  {
750
  "epoch": 24.92,
751
+ "learning_rate": 0.0006496311848741363,
752
+ "loss": 0.0049,
753
+ "step": 83083
754
  },
755
  {
756
  "epoch": 25.0,
757
+ "eval_loss": 0.0276066605001688,
758
+ "eval_max_distance": 11,
759
+ "eval_mean_distance": 0,
760
+ "eval_runtime": 0.2475,
761
+ "eval_samples_per_second": 202.046,
762
+ "eval_steps_per_second": 8.082,
763
+ "step": 83350
764
  },
765
  {
766
  "epoch": 25.22,
767
+ "learning_rate": 0.0006440711857628475,
768
+ "loss": 0.0045,
769
+ "step": 84084
770
  },
771
  {
772
  "epoch": 25.52,
773
+ "learning_rate": 0.0006385111866515586,
774
  "loss": 0.0045,
775
+ "step": 85085
776
  },
777
  {
778
  "epoch": 25.82,
779
+ "learning_rate": 0.0006329511875402698,
780
+ "loss": 0.0053,
781
+ "step": 86086
782
  },
783
  {
784
  "epoch": 26.0,
785
+ "eval_loss": 0.030818996950984,
786
+ "eval_max_distance": 10,
787
+ "eval_mean_distance": 0,
788
+ "eval_runtime": 0.2424,
789
+ "eval_samples_per_second": 206.301,
790
+ "eval_steps_per_second": 8.252,
791
+ "step": 86684
792
  },
793
  {
794
  "epoch": 26.12,
795
+ "learning_rate": 0.0006273911884289809,
796
+ "loss": 0.0045,
797
+ "step": 87087
798
  },
799
  {
800
  "epoch": 26.42,
801
+ "learning_rate": 0.000621831189317692,
802
+ "loss": 0.0041,
803
+ "step": 88088
804
  },
805
  {
806
  "epoch": 26.72,
807
+ "learning_rate": 0.0006162711902064032,
808
+ "loss": 0.0041,
809
+ "step": 89089
810
  },
811
  {
812
  "epoch": 27.0,
813
+ "eval_loss": 0.027929000556468964,
814
+ "eval_max_distance": 10,
815
+ "eval_mean_distance": 0,
816
+ "eval_runtime": 0.2471,
817
+ "eval_samples_per_second": 202.312,
818
+ "eval_steps_per_second": 8.092,
819
+ "step": 90018
820
  },
821
  {
822
  "epoch": 27.02,
823
+ "learning_rate": 0.0006107111910951143,
824
  "loss": 0.0043,
825
+ "step": 90090
826
  },
827
  {
828
  "epoch": 27.32,
829
+ "learning_rate": 0.0006051511919838255,
830
+ "loss": 0.0038,
831
+ "step": 91091
832
  },
833
  {
834
  "epoch": 27.62,
835
+ "learning_rate": 0.0005995911928725366,
836
+ "loss": 0.0038,
837
+ "step": 92092
838
  },
839
  {
840
  "epoch": 27.92,
841
+ "learning_rate": 0.0005940311937612478,
842
+ "loss": 0.0041,
843
+ "step": 93093
844
  },
845
  {
846
  "epoch": 28.0,
847
+ "eval_loss": 0.029230400919914246,
848
+ "eval_max_distance": 11,
849
+ "eval_mean_distance": 0,
850
+ "eval_runtime": 0.2482,
851
+ "eval_samples_per_second": 201.481,
852
+ "eval_steps_per_second": 8.059,
853
+ "step": 93352
854
  },
855
  {
856
  "epoch": 28.22,
857
+ "learning_rate": 0.0005884711946499589,
858
  "loss": 0.0037,
859
+ "step": 94094
860
  },
861
  {
862
  "epoch": 28.52,
863
+ "learning_rate": 0.00058291119553867,
864
+ "loss": 0.0033,
865
+ "step": 95095
866
  },
867
  {
868
  "epoch": 28.82,
869
+ "learning_rate": 0.0005773511964273812,
870
+ "loss": 0.0037,
871
+ "step": 96096
872
  },
873
  {
874
  "epoch": 29.0,
875
+ "eval_loss": 0.030607566237449646,
876
+ "eval_max_distance": 11,
877
+ "eval_mean_distance": 0,
878
+ "eval_runtime": 0.2429,
879
+ "eval_samples_per_second": 205.838,
880
+ "eval_steps_per_second": 8.234,
881
+ "step": 96686
882
  },
883
  {
884
  "epoch": 29.12,
885
+ "learning_rate": 0.0005717911973160923,
886
+ "loss": 0.0036,
887
+ "step": 97097
888
  },
889
  {
890
  "epoch": 29.42,
891
+ "learning_rate": 0.0005662311982048035,
892
  "loss": 0.0033,
893
+ "step": 98098
894
  },
895
  {
896
  "epoch": 29.72,
897
+ "learning_rate": 0.0005606711990935146,
898
+ "loss": 0.0035,
899
+ "step": 99099
900
  },
901
  {
902
  "epoch": 30.0,
903
+ "eval_loss": 0.027241094037890434,
904
+ "eval_max_distance": 12,
905
+ "eval_mean_distance": 0,
906
+ "eval_runtime": 0.2466,
907
+ "eval_samples_per_second": 202.757,
908
+ "eval_steps_per_second": 8.11,
909
+ "step": 100020
910
  },
911
  {
912
  "epoch": 30.02,
913
+ "learning_rate": 0.0005551111999822258,
914
+ "loss": 0.0033,
915
+ "step": 100100
916
  },
917
  {
918
  "epoch": 30.32,
919
+ "learning_rate": 0.0005495512008709369,
920
  "loss": 0.003,
921
+ "step": 101101
922
  },
923
  {
924
  "epoch": 30.62,
925
+ "learning_rate": 0.0005439912017596481,
926
+ "loss": 0.0031,
927
+ "step": 102102
928
  },
929
  {
930
  "epoch": 30.92,
931
+ "learning_rate": 0.0005384312026483592,
932
+ "loss": 0.0032,
933
+ "step": 103103
934
  },
935
  {
936
  "epoch": 31.0,
937
+ "eval_loss": 0.0254651065915823,
938
+ "eval_max_distance": 9,
939
+ "eval_mean_distance": 0,
940
+ "eval_runtime": 0.2446,
941
+ "eval_samples_per_second": 204.388,
942
+ "eval_steps_per_second": 8.176,
943
+ "step": 103354
944
  },
945
  {
946
  "epoch": 31.22,
947
+ "learning_rate": 0.0005328712035370704,
948
+ "loss": 0.0028,
949
+ "step": 104104
950
  },
951
  {
952
+ "epoch": 31.53,
953
+ "learning_rate": 0.0005273112044257815,
954
+ "loss": 0.0029,
955
+ "step": 105105
956
  },
957
  {
958
+ "epoch": 31.83,
959
+ "learning_rate": 0.0005217512053144927,
960
+ "loss": 0.0031,
961
+ "step": 106106
962
  },
963
  {
964
  "epoch": 32.0,
965
+ "eval_loss": 0.02928677573800087,
966
+ "eval_max_distance": 10,
967
+ "eval_mean_distance": 0,
968
+ "eval_runtime": 0.2518,
969
+ "eval_samples_per_second": 198.594,
970
+ "eval_steps_per_second": 7.944,
971
+ "step": 106688
972
  },
973
  {
974
+ "epoch": 32.13,
975
+ "learning_rate": 0.0005161912062032039,
976
  "loss": 0.0028,
977
+ "step": 107107
978
  },
979
  {
980
+ "epoch": 32.43,
981
+ "learning_rate": 0.0005106312070919149,
982
+ "loss": 0.0026,
983
+ "step": 108108
984
  },
985
  {
986
+ "epoch": 32.73,
987
+ "learning_rate": 0.0005050712079806262,
988
+ "loss": 0.0029,
989
+ "step": 109109
990
  },
991
  {
992
  "epoch": 33.0,
993
+ "eval_loss": 0.029988963156938553,
994
+ "eval_max_distance": 13,
995
  "eval_mean_distance": 0,
996
+ "eval_runtime": 0.2465,
997
+ "eval_samples_per_second": 202.802,
998
+ "eval_steps_per_second": 8.112,
999
+ "step": 110022
1000
  },
1001
  {
1002
+ "epoch": 33.03,
1003
+ "learning_rate": 0.0004995112088693373,
1004
+ "loss": 0.0027,
1005
+ "step": 110110
1006
  },
1007
  {
1008
+ "epoch": 33.33,
1009
+ "learning_rate": 0.0004939512097580485,
1010
+ "loss": 0.0025,
1011
+ "step": 111111
1012
  },
1013
  {
1014
+ "epoch": 33.63,
1015
+ "learning_rate": 0.0004883912106467596,
1016
  "loss": 0.0026,
1017
+ "step": 112112
1018
  },
1019
  {
1020
+ "epoch": 33.93,
1021
+ "learning_rate": 0.0004828312115354707,
1022
+ "loss": 0.0026,
1023
+ "step": 113113
1024
  },
1025
  {
1026
  "epoch": 34.0,
1027
+ "eval_loss": 0.03050011210143566,
1028
+ "eval_max_distance": 11,
1029
+ "eval_mean_distance": 0,
1030
+ "eval_runtime": 0.2507,
1031
+ "eval_samples_per_second": 199.458,
1032
+ "eval_steps_per_second": 7.978,
1033
+ "step": 113356
1034
  },
1035
  {
1036
+ "epoch": 34.23,
1037
+ "learning_rate": 0.00047727121242418185,
1038
+ "loss": 0.0025,
1039
+ "step": 114114
1040
  },
1041
  {
1042
+ "epoch": 34.53,
1043
+ "learning_rate": 0.00047171121331289294,
1044
  "loss": 0.0023,
1045
+ "step": 115115
1046
  },
1047
  {
1048
+ "epoch": 34.83,
1049
+ "learning_rate": 0.0004661512142016041,
1050
+ "loss": 0.0024,
1051
+ "step": 116116
1052
  },
1053
  {
1054
  "epoch": 35.0,
1055
+ "eval_loss": 0.027280788868665695,
1056
+ "eval_max_distance": 9,
1057
  "eval_mean_distance": 0,
1058
+ "eval_runtime": 0.2447,
1059
+ "eval_samples_per_second": 204.372,
1060
+ "eval_steps_per_second": 8.175,
1061
+ "step": 116690
1062
  },
1063
  {
1064
+ "epoch": 35.13,
1065
+ "learning_rate": 0.00046059121509031524,
1066
+ "loss": 0.0024,
1067
+ "step": 117117
1068
  },
1069
  {
1070
+ "epoch": 35.43,
1071
+ "learning_rate": 0.00045503121597902644,
1072
+ "loss": 0.0022,
1073
+ "step": 118118
1074
  },
1075
  {
1076
+ "epoch": 35.73,
1077
+ "learning_rate": 0.0004494712168677376,
1078
  "loss": 0.0023,
1079
+ "step": 119119
1080
  },
1081
  {
1082
  "epoch": 36.0,
1083
+ "eval_loss": 0.028403306379914284,
1084
+ "eval_max_distance": 10,
1085
  "eval_mean_distance": 0,
1086
+ "eval_runtime": 0.2435,
1087
+ "eval_samples_per_second": 205.364,
1088
+ "eval_steps_per_second": 8.215,
1089
+ "step": 120024
1090
  },
1091
  {
1092
+ "epoch": 36.03,
1093
+ "learning_rate": 0.00044391121775644874,
1094
+ "loss": 0.0022,
1095
+ "step": 120120
1096
  },
1097
  {
1098
+ "epoch": 36.33,
1099
+ "learning_rate": 0.0004383512186451599,
1100
+ "loss": 0.002,
1101
+ "step": 121121
1102
  },
1103
  {
1104
+ "epoch": 36.63,
1105
+ "learning_rate": 0.00043279121953387103,
1106
+ "loss": 0.0022,
1107
+ "step": 122122
1108
  },
1109
  {
1110
+ "epoch": 36.93,
1111
+ "learning_rate": 0.0004272312204225822,
1112
+ "loss": 0.0022,
1113
+ "step": 123123
1114
  },
1115
  {
1116
  "epoch": 37.0,
1117
+ "eval_loss": 0.03133893013000488,
1118
+ "eval_max_distance": 13,
1119
+ "eval_mean_distance": 0,
1120
+ "eval_runtime": 0.2436,
1121
+ "eval_samples_per_second": 205.289,
1122
+ "eval_steps_per_second": 8.212,
1123
+ "step": 123358
1124
  },
1125
  {
1126
+ "epoch": 37.23,
1127
+ "learning_rate": 0.00042167122131129333,
1128
+ "loss": 0.0019,
1129
+ "step": 124124
1130
  },
1131
  {
1132
+ "epoch": 37.53,
1133
+ "learning_rate": 0.0004161112222000045,
1134
  "loss": 0.0019,
1135
+ "step": 125125
1136
  },
1137
  {
1138
+ "epoch": 37.83,
1139
+ "learning_rate": 0.0004105512230887156,
1140
+ "loss": 0.002,
1141
+ "step": 126126
1142
  },
1143
  {
1144
  "epoch": 38.0,
1145
+ "eval_loss": 0.034086938947439194,
1146
+ "eval_max_distance": 13,
1147
+ "eval_mean_distance": 0,
1148
+ "eval_runtime": 0.242,
1149
+ "eval_samples_per_second": 206.579,
1150
+ "eval_steps_per_second": 8.263,
1151
+ "step": 126692
1152
  },
1153
  {
1154
+ "epoch": 38.13,
1155
+ "learning_rate": 0.0004049912239774268,
1156
+ "loss": 0.002,
1157
+ "step": 127127
1158
  },
1159
  {
1160
+ "epoch": 38.43,
1161
+ "learning_rate": 0.00039943122486613787,
1162
  "loss": 0.0018,
1163
+ "step": 128128
1164
  },
1165
  {
1166
+ "epoch": 38.73,
1167
+ "learning_rate": 0.000393871225754849,
1168
  "loss": 0.0017,
1169
+ "step": 129129
1170
  },
1171
  {
1172
  "epoch": 39.0,
1173
+ "eval_loss": 0.03005034476518631,
1174
+ "eval_max_distance": 13,
1175
  "eval_mean_distance": 0,
1176
+ "eval_runtime": 0.2407,
1177
+ "eval_samples_per_second": 207.711,
1178
+ "eval_steps_per_second": 8.308,
1179
+ "step": 130026
1180
  },
1181
  {
1182
+ "epoch": 39.03,
1183
+ "learning_rate": 0.00038831122664356016,
1184
+ "loss": 0.0018,
1185
+ "step": 130130
1186
  },
1187
  {
1188
+ "epoch": 39.33,
1189
+ "learning_rate": 0.0003827512275322713,
1190
  "loss": 0.0016,
1191
+ "step": 131131
1192
  },
1193
  {
1194
+ "epoch": 39.63,
1195
+ "learning_rate": 0.00037719122842098246,
1196
  "loss": 0.0017,
1197
+ "step": 132132
1198
  },
1199
  {
1200
+ "epoch": 39.93,
1201
+ "learning_rate": 0.0003716312293096936,
1202
  "loss": 0.0017,
1203
+ "step": 133133
1204
  },
1205
  {
1206
  "epoch": 40.0,
1207
+ "eval_loss": 0.03297489508986473,
1208
+ "eval_max_distance": 11,
1209
  "eval_mean_distance": 0,
1210
+ "eval_runtime": 0.2478,
1211
+ "eval_samples_per_second": 201.796,
1212
+ "eval_steps_per_second": 8.072,
1213
+ "step": 133360
1214
  },
1215
  {
1216
+ "epoch": 40.23,
1217
+ "learning_rate": 0.00036607123019840476,
1218
  "loss": 0.0015,
1219
+ "step": 134134
1220
  },
1221
  {
1222
+ "epoch": 40.53,
1223
+ "learning_rate": 0.0003605112310871159,
1224
+ "loss": 0.0015,
1225
+ "step": 135135
1226
  },
1227
  {
1228
+ "epoch": 40.83,
1229
+ "learning_rate": 0.00035495123197582705,
1230
  "loss": 0.0016,
1231
+ "step": 136136
1232
  },
1233
  {
1234
  "epoch": 41.0,
1235
+ "eval_loss": 0.03444751352071762,
1236
+ "eval_max_distance": 11,
1237
  "eval_mean_distance": 0,
1238
+ "eval_runtime": 0.2543,
1239
+ "eval_samples_per_second": 196.583,
1240
+ "eval_steps_per_second": 7.863,
1241
+ "step": 136694
1242
  },
1243
  {
1244
+ "epoch": 41.13,
1245
+ "learning_rate": 0.0003493912328645382,
1246
  "loss": 0.0015,
1247
+ "step": 137137
1248
  },
1249
  {
1250
  "epoch": 41.43,
1251
+ "learning_rate": 0.00034383123375324935,
1252
  "loss": 0.0014,
1253
+ "step": 138138
1254
  },
1255
  {
1256
  "epoch": 41.73,
1257
+ "learning_rate": 0.0003382712346419605,
1258
+ "loss": 0.0014,
1259
+ "step": 139139
1260
  },
1261
  {
1262
  "epoch": 42.0,
1263
+ "eval_loss": 0.033661480993032455,
1264
+ "eval_max_distance": 10,
1265
+ "eval_mean_distance": 0,
1266
+ "eval_runtime": 0.251,
1267
+ "eval_samples_per_second": 199.199,
1268
+ "eval_steps_per_second": 7.968,
1269
+ "step": 140028
1270
  },
1271
  {
1272
  "epoch": 42.03,
1273
+ "learning_rate": 0.0003327112355306717,
1274
+ "loss": 0.0015,
1275
+ "step": 140140
1276
  },
1277
  {
1278
  "epoch": 42.33,
1279
+ "learning_rate": 0.0003271512364193828,
1280
+ "loss": 0.0014,
1281
+ "step": 141141
1282
  },
1283
  {
1284
  "epoch": 42.63,
1285
+ "learning_rate": 0.00032159123730809394,
1286
+ "loss": 0.0014,
1287
+ "step": 142142
1288
  },
1289
  {
1290
  "epoch": 42.93,
1291
+ "learning_rate": 0.0003160312381968051,
1292
+ "loss": 0.0013,
1293
+ "step": 143143
1294
  },
1295
  {
1296
  "epoch": 43.0,
1297
+ "eval_loss": 0.029230637475848198,
1298
+ "eval_max_distance": 12,
1299
  "eval_mean_distance": 0,
1300
+ "eval_runtime": 0.2458,
1301
+ "eval_samples_per_second": 203.394,
1302
+ "eval_steps_per_second": 8.136,
1303
+ "step": 143362
1304
  },
1305
  {
1306
  "epoch": 43.23,
1307
+ "learning_rate": 0.00031047123908551624,
1308
+ "loss": 0.0012,
1309
+ "step": 144144
1310
  },
1311
  {
1312
  "epoch": 43.53,
1313
+ "learning_rate": 0.0003049112399742274,
1314
+ "loss": 0.0012,
1315
+ "step": 145145
1316
  },
1317
  {
1318
+ "epoch": 43.84,
1319
+ "learning_rate": 0.00029935124086293854,
1320
+ "loss": 0.0012,
1321
+ "step": 146146
1322
  },
1323
  {
1324
  "epoch": 44.0,
1325
+ "eval_loss": 0.03386835753917694,
1326
+ "eval_max_distance": 11,
1327
+ "eval_mean_distance": 0,
1328
+ "eval_runtime": 0.248,
1329
+ "eval_samples_per_second": 201.602,
1330
+ "eval_steps_per_second": 8.064,
1331
+ "step": 146696
1332
  },
1333
  {
1334
+ "epoch": 44.14,
1335
+ "learning_rate": 0.0002937912417516497,
1336
  "loss": 0.0012,
1337
+ "step": 147147
1338
  },
1339
  {
1340
+ "epoch": 44.44,
1341
+ "learning_rate": 0.00028823124264036083,
1342
  "loss": 0.0011,
1343
+ "step": 148148
1344
  },
1345
  {
1346
+ "epoch": 44.74,
1347
+ "learning_rate": 0.000282671243529072,
1348
+ "loss": 0.0012,
1349
+ "step": 149149
1350
  },
1351
  {
1352
  "epoch": 45.0,
1353
+ "eval_loss": 0.03299795091152191,
1354
+ "eval_max_distance": 11,
1355
+ "eval_mean_distance": 0,
1356
+ "eval_runtime": 0.2516,
1357
+ "eval_samples_per_second": 198.692,
1358
+ "eval_steps_per_second": 7.948,
1359
+ "step": 150030
 
 
 
 
 
 
1360
  },
1361
  {
1362
+ "epoch": 45.04,
1363
+ "learning_rate": 0.00027711124441778313,
1364
+ "loss": 0.0012,
1365
+ "step": 150150
1366
  },
1367
  {
1368
+ "epoch": 45.34,
1369
+ "learning_rate": 0.0002715512453064943,
1370
  "loss": 0.001,
1371
+ "step": 151151
1372
  },
1373
  {
1374
+ "epoch": 45.64,
1375
+ "learning_rate": 0.0002659912461952054,
1376
  "loss": 0.0011,
1377
+ "step": 152152
1378
+ },
1379
+ {
1380
+ "epoch": 45.94,
1381
+ "learning_rate": 0.0002604312470839166,
1382
+ "loss": 0.001,
1383
+ "step": 153153
1384
  },
1385
  {
1386
  "epoch": 46.0,
1387
+ "eval_loss": 0.030699940398335457,
1388
+ "eval_max_distance": 11,
1389
  "eval_mean_distance": 0,
1390
+ "eval_runtime": 0.2486,
1391
+ "eval_samples_per_second": 201.091,
1392
+ "eval_steps_per_second": 8.044,
1393
+ "step": 153364
1394
  },
1395
  {
1396
+ "epoch": 46.24,
1397
+ "learning_rate": 0.0002548712479726277,
1398
  "loss": 0.001,
1399
+ "step": 154154
1400
  },
1401
  {
1402
+ "epoch": 46.54,
1403
+ "learning_rate": 0.00024931124886133887,
1404
+ "loss": 0.0009,
1405
+ "step": 155155
1406
  },
1407
  {
1408
+ "epoch": 46.84,
1409
+ "learning_rate": 0.00024375124975005,
1410
  "loss": 0.001,
1411
+ "step": 156156
1412
  },
1413
  {
1414
  "epoch": 47.0,
1415
+ "eval_loss": 0.032952647656202316,
1416
+ "eval_max_distance": 10,
1417
  "eval_mean_distance": 0,
1418
+ "eval_runtime": 0.2471,
1419
+ "eval_samples_per_second": 202.373,
1420
+ "eval_steps_per_second": 8.095,
1421
+ "step": 156698
1422
  },
1423
  {
1424
+ "epoch": 47.14,
1425
+ "learning_rate": 0.00023819125063876117,
1426
+ "loss": 0.0013,
1427
+ "step": 157157
1428
  },
1429
  {
1430
+ "epoch": 47.44,
1431
+ "learning_rate": 0.0002326312515274723,
1432
  "loss": 0.0009,
1433
+ "step": 158158
1434
  },
1435
  {
1436
+ "epoch": 47.74,
1437
+ "learning_rate": 0.00022707125241618344,
1438
  "loss": 0.0009,
1439
+ "step": 159159
1440
  },
1441
  {
1442
  "epoch": 48.0,
1443
+ "eval_loss": 0.03382818400859833,
1444
+ "eval_max_distance": 11,
1445
  "eval_mean_distance": 0,
1446
+ "eval_runtime": 0.2551,
1447
+ "eval_samples_per_second": 195.998,
1448
+ "eval_steps_per_second": 7.84,
1449
+ "step": 160032
1450
  },
1451
  {
1452
+ "epoch": 48.04,
1453
+ "learning_rate": 0.00022151125330489458,
1454
  "loss": 0.0009,
1455
+ "step": 160160
1456
  },
1457
  {
1458
+ "epoch": 48.34,
1459
+ "learning_rate": 0.00021595125419360573,
1460
  "loss": 0.0008,
1461
+ "step": 161161
1462
  },
1463
  {
1464
+ "epoch": 48.64,
1465
+ "learning_rate": 0.00021039125508231688,
1466
+ "loss": 0.0009,
1467
+ "step": 162162
1468
  },
1469
  {
1470
+ "epoch": 48.94,
1471
+ "learning_rate": 0.00020483125597102803,
1472
+ "loss": 0.0009,
1473
+ "step": 163163
1474
  },
1475
  {
1476
  "epoch": 49.0,
1477
+ "eval_loss": 0.02877364680171013,
1478
+ "eval_max_distance": 10,
1479
+ "eval_mean_distance": 0,
1480
+ "eval_runtime": 0.2518,
1481
+ "eval_samples_per_second": 198.574,
1482
+ "eval_steps_per_second": 7.943,
1483
+ "step": 163366
1484
  },
1485
  {
1486
+ "epoch": 49.24,
1487
+ "learning_rate": 0.00019927125685973918,
1488
  "loss": 0.0008,
1489
+ "step": 164164
1490
  },
1491
  {
1492
+ "epoch": 49.54,
1493
+ "learning_rate": 0.0001937112577484503,
1494
  "loss": 0.0008,
1495
+ "step": 165165
1496
  },
1497
  {
1498
+ "epoch": 49.84,
1499
+ "learning_rate": 0.00018815125863716145,
1500
  "loss": 0.0008,
1501
+ "step": 166166
1502
  },
1503
  {
1504
  "epoch": 50.0,
1505
+ "eval_loss": 0.02558927983045578,
1506
+ "eval_max_distance": 10,
1507
  "eval_mean_distance": 0,
1508
+ "eval_runtime": 0.2461,
1509
+ "eval_samples_per_second": 203.155,
1510
+ "eval_steps_per_second": 8.126,
1511
+ "step": 166700
1512
  },
1513
  {
1514
+ "epoch": 50.14,
1515
+ "learning_rate": 0.0001825912595258726,
1516
+ "loss": 0.0007,
1517
+ "step": 167167
1518
  },
1519
  {
1520
+ "epoch": 50.44,
1521
+ "learning_rate": 0.00017703126041458374,
1522
  "loss": 0.0007,
1523
+ "step": 168168
1524
  },
1525
  {
1526
+ "epoch": 50.74,
1527
+ "learning_rate": 0.00017147126130329492,
1528
+ "loss": 0.0007,
1529
+ "step": 169169
1530
  },
1531
  {
1532
  "epoch": 51.0,
1533
+ "eval_loss": 0.02841602824628353,
1534
+ "eval_max_distance": 11,
1535
  "eval_mean_distance": 0,
1536
+ "eval_runtime": 0.2394,
1537
+ "eval_samples_per_second": 208.815,
1538
+ "eval_steps_per_second": 8.353,
1539
+ "step": 170034
1540
  },
1541
  {
1542
+ "epoch": 51.04,
1543
+ "learning_rate": 0.00016591126219200607,
1544
  "loss": 0.0007,
1545
+ "step": 170170
1546
  },
1547
  {
1548
+ "epoch": 51.34,
1549
+ "learning_rate": 0.0001603512630807172,
1550
  "loss": 0.0007,
1551
+ "step": 171171
1552
  },
1553
  {
1554
+ "epoch": 51.64,
1555
+ "learning_rate": 0.00015479126396942834,
1556
+ "loss": 0.0006,
1557
+ "step": 172172
1558
  },
1559
  {
1560
+ "epoch": 51.94,
1561
+ "learning_rate": 0.00014923126485813948,
1562
+ "loss": 0.0006,
1563
+ "step": 173173
1564
  },
1565
  {
1566
  "epoch": 52.0,
1567
+ "eval_loss": 0.03416401892900467,
1568
+ "eval_max_distance": 10,
1569
  "eval_mean_distance": 0,
1570
+ "eval_runtime": 0.2536,
1571
+ "eval_samples_per_second": 197.147,
1572
+ "eval_steps_per_second": 7.886,
1573
+ "step": 173368
1574
  },
1575
  {
1576
+ "epoch": 52.24,
1577
+ "learning_rate": 0.00014367126574685063,
1578
  "loss": 0.0006,
1579
+ "step": 174174
1580
  },
1581
  {
1582
+ "epoch": 52.54,
1583
+ "learning_rate": 0.00013811126663556178,
1584
  "loss": 0.0006,
1585
+ "step": 175175
1586
  },
1587
  {
1588
+ "epoch": 52.84,
1589
+ "learning_rate": 0.00013255126752427293,
1590
  "loss": 0.0006,
1591
+ "step": 176176
1592
  },
1593
  {
1594
  "epoch": 53.0,
1595
+ "eval_loss": 0.031156664714217186,
1596
+ "eval_max_distance": 10,
1597
+ "eval_mean_distance": 0,
1598
+ "eval_runtime": 0.2541,
1599
+ "eval_samples_per_second": 196.804,
1600
+ "eval_steps_per_second": 7.872,
1601
+ "step": 176702
1602
  },
1603
  {
1604
+ "epoch": 53.14,
1605
+ "learning_rate": 0.00012699126841298408,
1606
  "loss": 0.0006,
1607
+ "step": 177177
1608
  },
1609
  {
1610
+ "epoch": 53.44,
1611
+ "learning_rate": 0.00012143126930169523,
1612
+ "loss": 0.0005,
1613
+ "step": 178178
1614
  },
1615
  {
1616
+ "epoch": 53.74,
1617
+ "learning_rate": 0.00011587127019040637,
1618
  "loss": 0.0005,
1619
+ "step": 179179
1620
  },
1621
  {
1622
  "epoch": 54.0,
1623
+ "eval_loss": 0.03255148231983185,
1624
+ "eval_max_distance": 10,
1625
  "eval_mean_distance": 0,
1626
+ "eval_runtime": 0.2469,
1627
+ "eval_samples_per_second": 202.55,
1628
+ "eval_steps_per_second": 8.102,
1629
+ "step": 180036
1630
  },
1631
  {
1632
+ "epoch": 54.04,
1633
+ "learning_rate": 0.00011031127107911751,
1634
  "loss": 0.0005,
1635
+ "step": 180180
1636
  },
1637
  {
1638
+ "epoch": 54.34,
1639
+ "learning_rate": 0.00010475127196782866,
1640
+ "loss": 0.0006,
1641
+ "step": 181181
1642
  },
1643
  {
1644
+ "epoch": 54.64,
1645
+ "learning_rate": 9.91912728565398e-05,
1646
  "loss": 0.0005,
1647
+ "step": 182182
1648
  },
1649
  {
1650
+ "epoch": 54.94,
1651
+ "learning_rate": 9.363127374525095e-05,
1652
+ "loss": 0.0005,
1653
+ "step": 183183
1654
  },
1655
  {
1656
  "epoch": 55.0,
1657
+ "eval_loss": 0.030407395213842392,
1658
+ "eval_max_distance": 11,
1659
  "eval_mean_distance": 0,
1660
+ "eval_runtime": 0.2417,
1661
+ "eval_samples_per_second": 206.906,
1662
+ "eval_steps_per_second": 8.276,
1663
+ "step": 183370
1664
  },
1665
  {
1666
+ "epoch": 55.24,
1667
+ "learning_rate": 8.80712746339621e-05,
1668
  "loss": 0.0005,
1669
+ "step": 184184
1670
  },
1671
  {
1672
+ "epoch": 55.54,
1673
+ "learning_rate": 8.251127552267325e-05,
1674
  "loss": 0.0004,
1675
+ "step": 185185
1676
  },
1677
  {
1678
+ "epoch": 55.84,
1679
+ "learning_rate": 7.695127641138438e-05,
1680
  "loss": 0.0005,
1681
+ "step": 186186
1682
  },
1683
  {
1684
  "epoch": 56.0,
1685
+ "eval_loss": 0.02997196838259697,
1686
+ "eval_max_distance": 11,
1687
  "eval_mean_distance": 0,
1688
+ "eval_runtime": 0.2484,
1689
+ "eval_samples_per_second": 201.291,
1690
+ "eval_steps_per_second": 8.052,
1691
+ "step": 186704
1692
  },
1693
  {
1694
+ "epoch": 56.14,
1695
+ "learning_rate": 7.139127730009553e-05,
1696
  "loss": 0.0004,
1697
+ "step": 187187
1698
  },
1699
  {
1700
+ "epoch": 56.45,
1701
+ "learning_rate": 6.583127818880668e-05,
1702
  "loss": 0.0004,
1703
+ "step": 188188
1704
  },
1705
  {
1706
+ "epoch": 56.75,
1707
+ "learning_rate": 6.027127907751783e-05,
1708
  "loss": 0.0004,
1709
+ "step": 189189
1710
  },
1711
  {
1712
  "epoch": 57.0,
1713
+ "eval_loss": 0.03127776086330414,
1714
+ "eval_max_distance": 11,
1715
  "eval_mean_distance": 0,
1716
+ "eval_runtime": 0.2542,
1717
+ "eval_samples_per_second": 196.708,
1718
+ "eval_steps_per_second": 7.868,
1719
+ "step": 190038
1720
  },
1721
  {
1722
+ "epoch": 57.05,
1723
+ "learning_rate": 5.471127996622898e-05,
1724
  "loss": 0.0004,
1725
+ "step": 190190
1726
  },
1727
  {
1728
+ "epoch": 57.35,
1729
+ "learning_rate": 4.9151280854940125e-05,
1730
  "loss": 0.0004,
1731
+ "step": 191191
1732
  },
1733
  {
1734
+ "epoch": 57.65,
1735
+ "learning_rate": 4.359128174365127e-05,
1736
  "loss": 0.0004,
1737
+ "step": 192192
1738
  },
1739
  {
1740
+ "epoch": 57.95,
1741
+ "learning_rate": 3.803128263236242e-05,
1742
  "loss": 0.0003,
1743
+ "step": 193193
1744
  },
1745
  {
1746
  "epoch": 58.0,
1747
+ "eval_loss": 0.03212800994515419,
1748
+ "eval_max_distance": 11,
1749
  "eval_mean_distance": 0,
1750
+ "eval_runtime": 0.236,
1751
+ "eval_samples_per_second": 211.858,
1752
+ "eval_steps_per_second": 8.474,
1753
+ "step": 193372
1754
  },
1755
  {
1756
+ "epoch": 58.25,
1757
+ "learning_rate": 3.247128352107356e-05,
1758
+ "loss": 0.0003,
1759
+ "step": 194194
1760
  },
1761
  {
1762
+ "epoch": 58.55,
1763
+ "learning_rate": 2.691128440978471e-05,
1764
  "loss": 0.0004,
1765
+ "step": 195195
1766
  },
1767
  {
1768
+ "epoch": 58.85,
1769
+ "learning_rate": 2.135128529849586e-05,
1770
  "loss": 0.0003,
1771
+ "step": 196196
1772
  },
1773
  {
1774
  "epoch": 59.0,
1775
+ "eval_loss": 0.031559597700834274,
1776
+ "eval_max_distance": 10,
1777
  "eval_mean_distance": 0,
1778
+ "eval_runtime": 0.2475,
1779
+ "eval_samples_per_second": 201.99,
1780
+ "eval_steps_per_second": 8.08,
1781
+ "step": 196706
1782
  },
1783
  {
1784
+ "epoch": 59.15,
1785
+ "learning_rate": 1.5791286187207e-05,
1786
+ "loss": 0.0003,
1787
+ "step": 197197
1788
  },
1789
  {
1790
+ "epoch": 59.45,
1791
+ "learning_rate": 1.023128707591815e-05,
1792
+ "loss": 0.0003,
1793
+ "step": 198198
1794
  },
1795
  {
1796
+ "epoch": 59.75,
1797
+ "learning_rate": 4.671287964629296e-06,
1798
  "loss": 0.0004,
1799
+ "step": 199199
1800
  },
1801
  {
1802
  "epoch": 60.0,
1803
+ "eval_loss": 0.03177854046225548,
1804
+ "eval_max_distance": 11,
1805
  "eval_mean_distance": 0,
1806
+ "eval_runtime": 0.2438,
1807
+ "eval_samples_per_second": 205.126,
1808
+ "eval_steps_per_second": 8.205,
1809
+ "step": 200040
1810
  },
1811
  {
1812
  "epoch": 60.0,
1813
+ "step": 200040,
1814
+ "total_flos": 1.1617191885791232e+17,
1815
+ "train_loss": 0.03170474885008116,
1816
+ "train_runtime": 15592.8332,
1817
+ "train_samples_per_second": 384.846,
1818
+ "train_steps_per_second": 12.829
1819
  }
1820
  ],
1821
+ "logging_steps": 1001,
1822
+ "max_steps": 200040,
1823
  "num_train_epochs": 60,
1824
+ "save_steps": 2001,
1825
+ "total_flos": 1.1617191885791232e+17,
1826
  "trial_name": null,
1827
  "trial_params": null
1828
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:970254644cb218db4599e9310f1083ff5880c007630cc4c6dbec952da37dd2a9
3
  size 4091
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:100e47428886cfceeb4983e829afe7caff9578529dd77c77ba43967c2229d9ca
3
  size 4091