OpenLeecher commited on
Commit
b4be6ac
·
verified ·
1 Parent(s): 8973879

End of training

Browse files
Files changed (5) hide show
  1. README.md +3 -2
  2. all_results.json +12 -0
  3. eval_results.json +7 -0
  4. train_results.json +8 -0
  5. trainer_state.json +1637 -0
README.md CHANGED
@@ -4,6 +4,7 @@ license: llama3.1
4
  base_model: meta-llama/Llama-3.1-8B
5
  tags:
6
  - llama-factory
 
7
  - generated_from_trainer
8
  model-index:
9
  - name: llama_8b_lima_43
@@ -15,9 +16,9 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # llama_8b_lima_43
17
 
18
- This model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 0.9356
21
 
22
  ## Model description
23
 
 
4
  base_model: meta-llama/Llama-3.1-8B
5
  tags:
6
  - llama-factory
7
+ - full
8
  - generated_from_trainer
9
  model-index:
10
  - name: llama_8b_lima_43
 
16
 
17
  # llama_8b_lima_43
18
 
19
+ This model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) on the open_webui_dataset dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.9357
22
 
23
  ## Model description
24
 
all_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 0.9996249062265566,
3
+ "eval_loss": 0.9357157349586487,
4
+ "eval_runtime": 20.4842,
5
+ "eval_samples_per_second": 9.764,
6
+ "eval_steps_per_second": 2.441,
7
+ "total_flos": 1.5465201990107136e+17,
8
+ "train_loss": 0.9640034305221815,
9
+ "train_runtime": 13404.8173,
10
+ "train_samples_per_second": 2.386,
11
+ "train_steps_per_second": 0.08
12
+ }
eval_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 0.9996249062265566,
3
+ "eval_loss": 0.9357157349586487,
4
+ "eval_runtime": 20.4842,
5
+ "eval_samples_per_second": 9.764,
6
+ "eval_steps_per_second": 2.441
7
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 0.9996249062265566,
3
+ "total_flos": 1.5465201990107136e+17,
4
+ "train_loss": 0.9640034305221815,
5
+ "train_runtime": 13404.8173,
6
+ "train_samples_per_second": 2.386,
7
+ "train_steps_per_second": 0.08
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1637 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.9996249062265566,
5
+ "eval_steps": 80,
6
+ "global_step": 1066,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0046886721680420105,
13
+ "grad_norm": 131.48858057089643,
14
+ "learning_rate": 5.5e-07,
15
+ "loss": 1.2775,
16
+ "step": 5
17
+ },
18
+ {
19
+ "epoch": 0.009377344336084021,
20
+ "grad_norm": 33.55585411699123,
21
+ "learning_rate": 1.1e-06,
22
+ "loss": 1.3472,
23
+ "step": 10
24
+ },
25
+ {
26
+ "epoch": 0.014066016504126031,
27
+ "grad_norm": 10.392565895406074,
28
+ "learning_rate": 1.6499999999999999e-06,
29
+ "loss": 1.1186,
30
+ "step": 15
31
+ },
32
+ {
33
+ "epoch": 0.018754688672168042,
34
+ "grad_norm": 13.30940048870882,
35
+ "learning_rate": 2.2e-06,
36
+ "loss": 1.1381,
37
+ "step": 20
38
+ },
39
+ {
40
+ "epoch": 0.023443360840210054,
41
+ "grad_norm": 25.46432441054564,
42
+ "learning_rate": 2.75e-06,
43
+ "loss": 1.1455,
44
+ "step": 25
45
+ },
46
+ {
47
+ "epoch": 0.028132033008252063,
48
+ "grad_norm": 26.00538399730873,
49
+ "learning_rate": 3.2999999999999997e-06,
50
+ "loss": 1.049,
51
+ "step": 30
52
+ },
53
+ {
54
+ "epoch": 0.032820705176294075,
55
+ "grad_norm": 9.35384218444162,
56
+ "learning_rate": 3.8499999999999996e-06,
57
+ "loss": 0.8965,
58
+ "step": 35
59
+ },
60
+ {
61
+ "epoch": 0.037509377344336084,
62
+ "grad_norm": 2.9429262000270113,
63
+ "learning_rate": 4.4e-06,
64
+ "loss": 1.1271,
65
+ "step": 40
66
+ },
67
+ {
68
+ "epoch": 0.04219804951237809,
69
+ "grad_norm": 3.2422986577839716,
70
+ "learning_rate": 4.95e-06,
71
+ "loss": 1.0061,
72
+ "step": 45
73
+ },
74
+ {
75
+ "epoch": 0.04688672168042011,
76
+ "grad_norm": 6.503070007858331,
77
+ "learning_rate": 5.5e-06,
78
+ "loss": 0.9871,
79
+ "step": 50
80
+ },
81
+ {
82
+ "epoch": 0.05157539384846212,
83
+ "grad_norm": 5.434874712126211,
84
+ "learning_rate": 5.414406436166232e-06,
85
+ "loss": 1.0584,
86
+ "step": 55
87
+ },
88
+ {
89
+ "epoch": 0.056264066016504126,
90
+ "grad_norm": 2.394040425022367,
91
+ "learning_rate": 5.32986463435603e-06,
92
+ "loss": 1.0672,
93
+ "step": 60
94
+ },
95
+ {
96
+ "epoch": 0.060952738184546135,
97
+ "grad_norm": 2.7812207561170776,
98
+ "learning_rate": 5.246366801851234e-06,
99
+ "loss": 0.9798,
100
+ "step": 65
101
+ },
102
+ {
103
+ "epoch": 0.06564141035258815,
104
+ "grad_norm": 3.4321299409489927,
105
+ "learning_rate": 5.163905165275343e-06,
106
+ "loss": 1.0684,
107
+ "step": 70
108
+ },
109
+ {
110
+ "epoch": 0.07033008252063015,
111
+ "grad_norm": 2.4331799242183005,
112
+ "learning_rate": 5.082471970641763e-06,
113
+ "loss": 1.1214,
114
+ "step": 75
115
+ },
116
+ {
117
+ "epoch": 0.07501875468867217,
118
+ "grad_norm": 2.66350051316822,
119
+ "learning_rate": 5.002059483402411e-06,
120
+ "loss": 1.0422,
121
+ "step": 80
122
+ },
123
+ {
124
+ "epoch": 0.07501875468867217,
125
+ "eval_loss": 1.0044387578964233,
126
+ "eval_runtime": 21.7707,
127
+ "eval_samples_per_second": 9.187,
128
+ "eval_steps_per_second": 2.297,
129
+ "step": 80
130
+ },
131
+ {
132
+ "epoch": 0.07970742685671418,
133
+ "grad_norm": 2.457085516495112,
134
+ "learning_rate": 4.922659988496696e-06,
135
+ "loss": 1.004,
136
+ "step": 85
137
+ },
138
+ {
139
+ "epoch": 0.08439609902475619,
140
+ "grad_norm": 2.673304521252877,
141
+ "learning_rate": 4.844265790400869e-06,
142
+ "loss": 1.1774,
143
+ "step": 90
144
+ },
145
+ {
146
+ "epoch": 0.0890847711927982,
147
+ "grad_norm": 2.9697389911450105,
148
+ "learning_rate": 4.766869213177739e-06,
149
+ "loss": 1.029,
150
+ "step": 95
151
+ },
152
+ {
153
+ "epoch": 0.09377344336084022,
154
+ "grad_norm": 2.7485846803270175,
155
+ "learning_rate": 4.690462600526791e-06,
156
+ "loss": 1.0735,
157
+ "step": 100
158
+ },
159
+ {
160
+ "epoch": 0.09846211552888222,
161
+ "grad_norm": 2.3597539988620047,
162
+ "learning_rate": 4.615038315834675e-06,
163
+ "loss": 1.0426,
164
+ "step": 105
165
+ },
166
+ {
167
+ "epoch": 0.10315078769692423,
168
+ "grad_norm": 10.287138782877946,
169
+ "learning_rate": 4.5405887422260886e-06,
170
+ "loss": 1.1257,
171
+ "step": 110
172
+ },
173
+ {
174
+ "epoch": 0.10783945986496624,
175
+ "grad_norm": 6.22099162659968,
176
+ "learning_rate": 4.467106282615065e-06,
177
+ "loss": 1.0203,
178
+ "step": 115
179
+ },
180
+ {
181
+ "epoch": 0.11252813203300825,
182
+ "grad_norm": 6.918830274001748,
183
+ "learning_rate": 4.394583359756651e-06,
184
+ "loss": 1.1145,
185
+ "step": 120
186
+ },
187
+ {
188
+ "epoch": 0.11721680420105027,
189
+ "grad_norm": 5.8742101902111,
190
+ "learning_rate": 4.323012416298999e-06,
191
+ "loss": 1.0943,
192
+ "step": 125
193
+ },
194
+ {
195
+ "epoch": 0.12190547636909227,
196
+ "grad_norm": 2.6642490667255525,
197
+ "learning_rate": 4.252385914835873e-06,
198
+ "loss": 1.087,
199
+ "step": 130
200
+ },
201
+ {
202
+ "epoch": 0.12659414853713427,
203
+ "grad_norm": 2.6965562685918023,
204
+ "learning_rate": 4.182696337959566e-06,
205
+ "loss": 1.0873,
206
+ "step": 135
207
+ },
208
+ {
209
+ "epoch": 0.1312828207051763,
210
+ "grad_norm": 2.951044122767734,
211
+ "learning_rate": 4.113936188314245e-06,
212
+ "loss": 0.8704,
213
+ "step": 140
214
+ },
215
+ {
216
+ "epoch": 0.1359714928732183,
217
+ "grad_norm": 2.101235354697959,
218
+ "learning_rate": 4.046097988649726e-06,
219
+ "loss": 1.1235,
220
+ "step": 145
221
+ },
222
+ {
223
+ "epoch": 0.1406601650412603,
224
+ "grad_norm": 3.2330279427425617,
225
+ "learning_rate": 3.979174281875685e-06,
226
+ "loss": 0.997,
227
+ "step": 150
228
+ },
229
+ {
230
+ "epoch": 0.14534883720930233,
231
+ "grad_norm": 3.3889833141888728,
232
+ "learning_rate": 3.9131576311163e-06,
233
+ "loss": 1.0179,
234
+ "step": 155
235
+ },
236
+ {
237
+ "epoch": 0.15003750937734434,
238
+ "grad_norm": 4.973298096388189,
239
+ "learning_rate": 3.848040619765356e-06,
240
+ "loss": 0.9127,
241
+ "step": 160
242
+ },
243
+ {
244
+ "epoch": 0.15003750937734434,
245
+ "eval_loss": 0.9606707692146301,
246
+ "eval_runtime": 20.2418,
247
+ "eval_samples_per_second": 9.881,
248
+ "eval_steps_per_second": 2.47,
249
+ "step": 160
250
+ },
251
+ {
252
+ "epoch": 0.15472618154538634,
253
+ "grad_norm": 8.29768606535662,
254
+ "learning_rate": 3.7838158515417857e-06,
255
+ "loss": 1.0887,
256
+ "step": 165
257
+ },
258
+ {
259
+ "epoch": 0.15941485371342837,
260
+ "grad_norm": 3.7810993101840147,
261
+ "learning_rate": 3.7204759505456866e-06,
262
+ "loss": 0.8966,
263
+ "step": 170
264
+ },
265
+ {
266
+ "epoch": 0.16410352588147037,
267
+ "grad_norm": 2.644220736384783,
268
+ "learning_rate": 3.65801356131479e-06,
269
+ "loss": 1.0543,
270
+ "step": 175
271
+ },
272
+ {
273
+ "epoch": 0.16879219804951237,
274
+ "grad_norm": 3.614698732279852,
275
+ "learning_rate": 3.596421348881407e-06,
276
+ "loss": 0.9387,
277
+ "step": 180
278
+ },
279
+ {
280
+ "epoch": 0.1734808702175544,
281
+ "grad_norm": 2.8015834886932787,
282
+ "learning_rate": 3.535691998829856e-06,
283
+ "loss": 1.0746,
284
+ "step": 185
285
+ },
286
+ {
287
+ "epoch": 0.1781695423855964,
288
+ "grad_norm": 2.564293653383082,
289
+ "learning_rate": 3.4758182173543725e-06,
290
+ "loss": 0.8235,
291
+ "step": 190
292
+ },
293
+ {
294
+ "epoch": 0.1828582145536384,
295
+ "grad_norm": 3.118218626122133,
296
+ "learning_rate": 3.4167927313175065e-06,
297
+ "loss": 0.9392,
298
+ "step": 195
299
+ },
300
+ {
301
+ "epoch": 0.18754688672168043,
302
+ "grad_norm": 3.1320015313206078,
303
+ "learning_rate": 3.358608288309036e-06,
304
+ "loss": 0.9951,
305
+ "step": 200
306
+ },
307
+ {
308
+ "epoch": 0.19223555888972244,
309
+ "grad_norm": 2.8693840620672026,
310
+ "learning_rate": 3.3012576567053635e-06,
311
+ "loss": 1.0835,
312
+ "step": 205
313
+ },
314
+ {
315
+ "epoch": 0.19692423105776444,
316
+ "grad_norm": 3.961730546242666,
317
+ "learning_rate": 3.2447336257294427e-06,
318
+ "loss": 0.8606,
319
+ "step": 210
320
+ },
321
+ {
322
+ "epoch": 0.20161290322580644,
323
+ "grad_norm": 3.380982676943891,
324
+ "learning_rate": 3.189029005511225e-06,
325
+ "loss": 1.0039,
326
+ "step": 215
327
+ },
328
+ {
329
+ "epoch": 0.20630157539384847,
330
+ "grad_norm": 3.072769902074738,
331
+ "learning_rate": 3.134136627148626e-06,
332
+ "loss": 0.9235,
333
+ "step": 220
334
+ },
335
+ {
336
+ "epoch": 0.21099024756189047,
337
+ "grad_norm": 2.245002916223262,
338
+ "learning_rate": 3.080049342769041e-06,
339
+ "loss": 1.06,
340
+ "step": 225
341
+ },
342
+ {
343
+ "epoch": 0.21567891972993247,
344
+ "grad_norm": 2.541443132900714,
345
+ "learning_rate": 3.026760025591393e-06,
346
+ "loss": 0.9432,
347
+ "step": 230
348
+ },
349
+ {
350
+ "epoch": 0.2203675918979745,
351
+ "grad_norm": 3.123240987262189,
352
+ "learning_rate": 2.97426156998874e-06,
353
+ "loss": 1.0341,
354
+ "step": 235
355
+ },
356
+ {
357
+ "epoch": 0.2250562640660165,
358
+ "grad_norm": 2.404173594460732,
359
+ "learning_rate": 2.9225468915514425e-06,
360
+ "loss": 0.8605,
361
+ "step": 240
362
+ },
363
+ {
364
+ "epoch": 0.2250562640660165,
365
+ "eval_loss": 0.944010317325592,
366
+ "eval_runtime": 20.3476,
367
+ "eval_samples_per_second": 9.829,
368
+ "eval_steps_per_second": 2.457,
369
+ "step": 240
370
+ },
371
+ {
372
+ "epoch": 0.2297449362340585,
373
+ "grad_norm": 2.839083125403713,
374
+ "learning_rate": 2.8716089271509e-06,
375
+ "loss": 0.986,
376
+ "step": 245
377
+ },
378
+ {
379
+ "epoch": 0.23443360840210054,
380
+ "grad_norm": 3.109434158155652,
381
+ "learning_rate": 2.8214406350038632e-06,
382
+ "loss": 0.8936,
383
+ "step": 250
384
+ },
385
+ {
386
+ "epoch": 0.23912228057014254,
387
+ "grad_norm": 2.5603812627943325,
388
+ "learning_rate": 2.772034994737337e-06,
389
+ "loss": 1.0787,
390
+ "step": 255
391
+ },
392
+ {
393
+ "epoch": 0.24381095273818454,
394
+ "grad_norm": 2.843918945845048,
395
+ "learning_rate": 2.7233850074540736e-06,
396
+ "loss": 0.8681,
397
+ "step": 260
398
+ },
399
+ {
400
+ "epoch": 0.24849962490622657,
401
+ "grad_norm": 3.295967379586164,
402
+ "learning_rate": 2.6754836957986757e-06,
403
+ "loss": 1.063,
404
+ "step": 265
405
+ },
406
+ {
407
+ "epoch": 0.25318829707426854,
408
+ "grad_norm": 2.6450503587985867,
409
+ "learning_rate": 2.6283241040243133e-06,
410
+ "loss": 0.9345,
411
+ "step": 270
412
+ },
413
+ {
414
+ "epoch": 0.25787696924231057,
415
+ "grad_norm": 2.6669681551281936,
416
+ "learning_rate": 2.5818992980600576e-06,
417
+ "loss": 0.9366,
418
+ "step": 275
419
+ },
420
+ {
421
+ "epoch": 0.2625656414103526,
422
+ "grad_norm": 2.365910075512645,
423
+ "learning_rate": 2.5362023655788563e-06,
424
+ "loss": 0.9222,
425
+ "step": 280
426
+ },
427
+ {
428
+ "epoch": 0.2672543135783946,
429
+ "grad_norm": 2.4773250316490336,
430
+ "learning_rate": 2.491226416066151e-06,
431
+ "loss": 0.9816,
432
+ "step": 285
433
+ },
434
+ {
435
+ "epoch": 0.2719429857464366,
436
+ "grad_norm": 2.051844651359742,
437
+ "learning_rate": 2.4469645808891426e-06,
438
+ "loss": 1.0592,
439
+ "step": 290
440
+ },
441
+ {
442
+ "epoch": 0.27663165791447863,
443
+ "grad_norm": 2.7110963822173875,
444
+ "learning_rate": 2.40341001336673e-06,
445
+ "loss": 0.9458,
446
+ "step": 295
447
+ },
448
+ {
449
+ "epoch": 0.2813203300825206,
450
+ "grad_norm": 2.1769151172554047,
451
+ "learning_rate": 2.3605558888401135e-06,
452
+ "loss": 0.8555,
453
+ "step": 300
454
+ },
455
+ {
456
+ "epoch": 0.28600900225056264,
457
+ "grad_norm": 2.626727731539155,
458
+ "learning_rate": 2.318395404744094e-06,
459
+ "loss": 1.1516,
460
+ "step": 305
461
+ },
462
+ {
463
+ "epoch": 0.29069767441860467,
464
+ "grad_norm": 2.306595833971501,
465
+ "learning_rate": 2.276921780679061e-06,
466
+ "loss": 0.8737,
467
+ "step": 310
468
+ },
469
+ {
470
+ "epoch": 0.29538634658664664,
471
+ "grad_norm": 2.569147248252582,
472
+ "learning_rate": 2.2361282584836925e-06,
473
+ "loss": 1.0032,
474
+ "step": 315
475
+ },
476
+ {
477
+ "epoch": 0.30007501875468867,
478
+ "grad_norm": 1.9670956149750123,
479
+ "learning_rate": 2.1960081023083778e-06,
480
+ "loss": 0.9068,
481
+ "step": 320
482
+ },
483
+ {
484
+ "epoch": 0.30007501875468867,
485
+ "eval_loss": 0.9342896938323975,
486
+ "eval_runtime": 20.3147,
487
+ "eval_samples_per_second": 9.845,
488
+ "eval_steps_per_second": 2.461,
489
+ "step": 320
490
+ },
491
+ {
492
+ "epoch": 0.3047636909227307,
493
+ "grad_norm": 2.0859945243647395,
494
+ "learning_rate": 2.156554598689365e-06,
495
+ "loss": 1.0191,
496
+ "step": 325
497
+ },
498
+ {
499
+ "epoch": 0.3094523630907727,
500
+ "grad_norm": 1.9655466100226573,
501
+ "learning_rate": 2.117761056623659e-06,
502
+ "loss": 0.9328,
503
+ "step": 330
504
+ },
505
+ {
506
+ "epoch": 0.3141410352588147,
507
+ "grad_norm": 2.581123721405676,
508
+ "learning_rate": 2.0796208076446752e-06,
509
+ "loss": 1.0101,
510
+ "step": 335
511
+ },
512
+ {
513
+ "epoch": 0.31882970742685673,
514
+ "grad_norm": 2.443562264938451,
515
+ "learning_rate": 2.0421272058986607e-06,
516
+ "loss": 0.9154,
517
+ "step": 340
518
+ },
519
+ {
520
+ "epoch": 0.3235183795948987,
521
+ "grad_norm": 2.3271716313484987,
522
+ "learning_rate": 2.0052736282219008e-06,
523
+ "loss": 1.044,
524
+ "step": 345
525
+ },
526
+ {
527
+ "epoch": 0.32820705176294074,
528
+ "grad_norm": 2.4236414480748834,
529
+ "learning_rate": 1.9690534742187182e-06,
530
+ "loss": 1.0251,
531
+ "step": 350
532
+ },
533
+ {
534
+ "epoch": 0.33289572393098277,
535
+ "grad_norm": 2.5311887907377253,
536
+ "learning_rate": 1.9334601663402865e-06,
537
+ "loss": 0.9126,
538
+ "step": 355
539
+ },
540
+ {
541
+ "epoch": 0.33758439609902474,
542
+ "grad_norm": 2.19719594303758,
543
+ "learning_rate": 1.898487149964267e-06,
544
+ "loss": 0.9877,
545
+ "step": 360
546
+ },
547
+ {
548
+ "epoch": 0.34227306826706677,
549
+ "grad_norm": 3.0147002131335805,
550
+ "learning_rate": 1.8641278934752799e-06,
551
+ "loss": 0.9057,
552
+ "step": 365
553
+ },
554
+ {
555
+ "epoch": 0.3469617404351088,
556
+ "grad_norm": 2.134323364449907,
557
+ "learning_rate": 1.8303758883462328e-06,
558
+ "loss": 1.0681,
559
+ "step": 370
560
+ },
561
+ {
562
+ "epoch": 0.3516504126031508,
563
+ "grad_norm": 2.3509776870701424,
564
+ "learning_rate": 1.7972246492205194e-06,
565
+ "loss": 0.9568,
566
+ "step": 375
567
+ },
568
+ {
569
+ "epoch": 0.3563390847711928,
570
+ "grad_norm": 2.1090286048701814,
571
+ "learning_rate": 1.7646677139950976e-06,
572
+ "loss": 1.032,
573
+ "step": 380
574
+ },
575
+ {
576
+ "epoch": 0.36102775693923483,
577
+ "grad_norm": 2.36149114890146,
578
+ "learning_rate": 1.7326986439044696e-06,
579
+ "loss": 0.9867,
580
+ "step": 385
581
+ },
582
+ {
583
+ "epoch": 0.3657164291072768,
584
+ "grad_norm": 2.255024349073518,
585
+ "learning_rate": 1.701311023605583e-06,
586
+ "loss": 0.8747,
587
+ "step": 390
588
+ },
589
+ {
590
+ "epoch": 0.37040510127531884,
591
+ "grad_norm": 2.6380764656009923,
592
+ "learning_rate": 1.6704984612636572e-06,
593
+ "loss": 0.9224,
594
+ "step": 395
595
+ },
596
+ {
597
+ "epoch": 0.37509377344336087,
598
+ "grad_norm": 2.4870934478859126,
599
+ "learning_rate": 1.6402545886389659e-06,
600
+ "loss": 0.9147,
601
+ "step": 400
602
+ },
603
+ {
604
+ "epoch": 0.37509377344336087,
605
+ "eval_loss": 0.9293374419212341,
606
+ "eval_runtime": 20.325,
607
+ "eval_samples_per_second": 9.84,
608
+ "eval_steps_per_second": 2.46,
609
+ "step": 400
610
+ },
611
+ {
612
+ "epoch": 0.37978244561140284,
613
+ "grad_norm": 2.5774447639043236,
614
+ "learning_rate": 1.610573061174586e-06,
615
+ "loss": 0.9285,
616
+ "step": 405
617
+ },
618
+ {
619
+ "epoch": 0.38447111777944487,
620
+ "grad_norm": 2.2678788716470817,
621
+ "learning_rate": 1.5814475580851346e-06,
622
+ "loss": 0.9994,
623
+ "step": 410
624
+ },
625
+ {
626
+ "epoch": 0.38915978994748684,
627
+ "grad_norm": 2.8722747503884114,
628
+ "learning_rate": 1.5528717824465089e-06,
629
+ "loss": 0.8864,
630
+ "step": 415
631
+ },
632
+ {
633
+ "epoch": 0.3938484621155289,
634
+ "grad_norm": 2.436535560556849,
635
+ "learning_rate": 1.5248394612866496e-06,
636
+ "loss": 1.1302,
637
+ "step": 420
638
+ },
639
+ {
640
+ "epoch": 0.3985371342835709,
641
+ "grad_norm": 1.9720574518399083,
642
+ "learning_rate": 1.4973443456773522e-06,
643
+ "loss": 0.9394,
644
+ "step": 425
645
+ },
646
+ {
647
+ "epoch": 0.4032258064516129,
648
+ "grad_norm": 2.6156706265956258,
649
+ "learning_rate": 1.4703802108271373e-06,
650
+ "loss": 0.7922,
651
+ "step": 430
652
+ },
653
+ {
654
+ "epoch": 0.4079144786196549,
655
+ "grad_norm": 2.295310925977984,
656
+ "learning_rate": 1.4439408561752077e-06,
657
+ "loss": 0.9312,
658
+ "step": 435
659
+ },
660
+ {
661
+ "epoch": 0.41260315078769694,
662
+ "grad_norm": 2.147812597078933,
663
+ "learning_rate": 1.4180201054865116e-06,
664
+ "loss": 0.9859,
665
+ "step": 440
666
+ },
667
+ {
668
+ "epoch": 0.4172918229557389,
669
+ "grad_norm": 2.029009456836816,
670
+ "learning_rate": 1.392611806947934e-06,
671
+ "loss": 1.0307,
672
+ "step": 445
673
+ },
674
+ {
675
+ "epoch": 0.42198049512378094,
676
+ "grad_norm": 2.590837380389044,
677
+ "learning_rate": 1.3677098332656357e-06,
678
+ "loss": 0.92,
679
+ "step": 450
680
+ },
681
+ {
682
+ "epoch": 0.42666916729182297,
683
+ "grad_norm": 2.3763009955883505,
684
+ "learning_rate": 1.3433080817635696e-06,
685
+ "loss": 1.0955,
686
+ "step": 455
687
+ },
688
+ {
689
+ "epoch": 0.43135783945986494,
690
+ "grad_norm": 2.80941534379534,
691
+ "learning_rate": 1.3194004744831898e-06,
692
+ "loss": 0.903,
693
+ "step": 460
694
+ },
695
+ {
696
+ "epoch": 0.436046511627907,
697
+ "grad_norm": 2.14656443429063,
698
+ "learning_rate": 1.2959809582843855e-06,
699
+ "loss": 0.8284,
700
+ "step": 465
701
+ },
702
+ {
703
+ "epoch": 0.440735183795949,
704
+ "grad_norm": 2.7676121711339423,
705
+ "learning_rate": 1.273043504947661e-06,
706
+ "loss": 1.0462,
707
+ "step": 470
708
+ },
709
+ {
710
+ "epoch": 0.445423855963991,
711
+ "grad_norm": 2.44008408182776,
712
+ "learning_rate": 1.2505821112775862e-06,
713
+ "loss": 0.979,
714
+ "step": 475
715
+ },
716
+ {
717
+ "epoch": 0.450112528132033,
718
+ "grad_norm": 2.584889518443889,
719
+ "learning_rate": 1.2285907992075474e-06,
720
+ "loss": 1.0192,
721
+ "step": 480
722
+ },
723
+ {
724
+ "epoch": 0.450112528132033,
725
+ "eval_loss": 0.9250276684761047,
726
+ "eval_runtime": 20.5471,
727
+ "eval_samples_per_second": 9.734,
728
+ "eval_steps_per_second": 2.433,
729
+ "step": 480
730
+ },
731
+ {
732
+ "epoch": 0.45480120030007504,
733
+ "grad_norm": 2.496373694809835,
734
+ "learning_rate": 1.207063615905829e-06,
735
+ "loss": 0.9037,
736
+ "step": 485
737
+ },
738
+ {
739
+ "epoch": 0.459489872468117,
740
+ "grad_norm": 2.371873747415523,
741
+ "learning_rate": 1.1859946338830404e-06,
742
+ "loss": 1.0312,
743
+ "step": 490
744
+ },
745
+ {
746
+ "epoch": 0.46417854463615904,
747
+ "grad_norm": 2.2570285962672894,
748
+ "learning_rate": 1.1653779511009372e-06,
749
+ "loss": 0.9113,
750
+ "step": 495
751
+ },
752
+ {
753
+ "epoch": 0.46886721680420107,
754
+ "grad_norm": 2.6279198777243606,
755
+ "learning_rate": 1.145207691082648e-06,
756
+ "loss": 0.8337,
757
+ "step": 500
758
+ },
759
+ {
760
+ "epoch": 0.47355588897224304,
761
+ "grad_norm": 2.7761777412837767,
762
+ "learning_rate": 1.1254780030243539e-06,
763
+ "loss": 0.9602,
764
+ "step": 505
765
+ },
766
+ {
767
+ "epoch": 0.4782445611402851,
768
+ "grad_norm": 2.5445962149324663,
769
+ "learning_rate": 1.1061830619084358e-06,
770
+ "loss": 0.9804,
771
+ "step": 510
772
+ },
773
+ {
774
+ "epoch": 0.4829332333083271,
775
+ "grad_norm": 2.3633364319931514,
776
+ "learning_rate": 1.087317068618139e-06,
777
+ "loss": 0.977,
778
+ "step": 515
779
+ },
780
+ {
781
+ "epoch": 0.4876219054763691,
782
+ "grad_norm": 1.9986528249443352,
783
+ "learning_rate": 1.0688742500537784e-06,
784
+ "loss": 0.9425,
785
+ "step": 520
786
+ },
787
+ {
788
+ "epoch": 0.4923105776444111,
789
+ "grad_norm": 2.4501412442964803,
790
+ "learning_rate": 1.0508488592505175e-06,
791
+ "loss": 0.9995,
792
+ "step": 525
793
+ },
794
+ {
795
+ "epoch": 0.49699924981245314,
796
+ "grad_norm": 2.4394994412302786,
797
+ "learning_rate": 1.0332351754977698e-06,
798
+ "loss": 0.9329,
799
+ "step": 530
800
+ },
801
+ {
802
+ "epoch": 0.5016879219804952,
803
+ "grad_norm": 2.2074019259444544,
804
+ "learning_rate": 1.016027504460246e-06,
805
+ "loss": 0.9203,
806
+ "step": 535
807
+ },
808
+ {
809
+ "epoch": 0.5063765941485371,
810
+ "grad_norm": 2.610216953517823,
811
+ "learning_rate": 9.992201783006927e-07,
812
+ "loss": 1.0291,
813
+ "step": 540
814
+ },
815
+ {
816
+ "epoch": 0.5110652663165791,
817
+ "grad_norm": 2.718383436345948,
818
+ "learning_rate": 9.828075558043617e-07,
819
+ "loss": 0.9292,
820
+ "step": 545
821
+ },
822
+ {
823
+ "epoch": 0.5157539384846211,
824
+ "grad_norm": 3.6229517535213143,
825
+ "learning_rate": 9.667840225052484e-07,
826
+ "loss": 1.0165,
827
+ "step": 550
828
+ },
829
+ {
830
+ "epoch": 0.5204426106526632,
831
+ "grad_norm": 2.566929441984276,
832
+ "learning_rate": 9.511439908141446e-07,
833
+ "loss": 1.012,
834
+ "step": 555
835
+ },
836
+ {
837
+ "epoch": 0.5251312828207052,
838
+ "grad_norm": 2.2133821736785015,
839
+ "learning_rate": 9.358819001485473e-07,
840
+ "loss": 0.8303,
841
+ "step": 560
842
+ },
843
+ {
844
+ "epoch": 0.5251312828207052,
845
+ "eval_loss": 0.9191934466362,
846
+ "eval_runtime": 20.5275,
847
+ "eval_samples_per_second": 9.743,
848
+ "eval_steps_per_second": 2.436,
849
+ "step": 560
850
+ },
851
+ {
852
+ "epoch": 0.5298199549887472,
853
+ "grad_norm": 2.530396363584249,
854
+ "learning_rate": 9.209922170644708e-07,
855
+ "loss": 1.022,
856
+ "step": 565
857
+ },
858
+ {
859
+ "epoch": 0.5345086271567892,
860
+ "grad_norm": 3.0761740344436017,
861
+ "learning_rate": 9.06469435390206e-07,
862
+ "loss": 0.9437,
863
+ "step": 570
864
+ },
865
+ {
866
+ "epoch": 0.5391972993248312,
867
+ "grad_norm": 2.679955680751534,
868
+ "learning_rate": 8.923080763620794e-07,
869
+ "loss": 0.9776,
870
+ "step": 575
871
+ },
872
+ {
873
+ "epoch": 0.5438859714928732,
874
+ "grad_norm": 2.477813948279551,
875
+ "learning_rate": 8.785026887622588e-07,
876
+ "loss": 0.8858,
877
+ "step": 580
878
+ },
879
+ {
880
+ "epoch": 0.5485746436609152,
881
+ "grad_norm": 3.021641574122816,
882
+ "learning_rate": 8.650478490586582e-07,
883
+ "loss": 0.9392,
884
+ "step": 585
885
+ },
886
+ {
887
+ "epoch": 0.5532633158289573,
888
+ "grad_norm": 3.0622821568316985,
889
+ "learning_rate": 8.519381615469985e-07,
890
+ "loss": 1.0067,
891
+ "step": 590
892
+ },
893
+ {
894
+ "epoch": 0.5579519879969993,
895
+ "grad_norm": 2.0240771327720606,
896
+ "learning_rate": 8.391682584950767e-07,
897
+ "loss": 0.8645,
898
+ "step": 595
899
+ },
900
+ {
901
+ "epoch": 0.5626406601650412,
902
+ "grad_norm": 2.4592599613454453,
903
+ "learning_rate": 8.267328002892997e-07,
904
+ "loss": 0.8116,
905
+ "step": 600
906
+ },
907
+ {
908
+ "epoch": 0.5673293323330832,
909
+ "grad_norm": 2.3279060484215304,
910
+ "learning_rate": 8.146264755835511e-07,
911
+ "loss": 1.0685,
912
+ "step": 605
913
+ },
914
+ {
915
+ "epoch": 0.5720180045011253,
916
+ "grad_norm": 2.1915343114528394,
917
+ "learning_rate": 8.028440014504431e-07,
918
+ "loss": 1.0312,
919
+ "step": 610
920
+ },
921
+ {
922
+ "epoch": 0.5767066766691673,
923
+ "grad_norm": 2.353311242466667,
924
+ "learning_rate": 7.913801235350256e-07,
925
+ "loss": 0.9753,
926
+ "step": 615
927
+ },
928
+ {
929
+ "epoch": 0.5813953488372093,
930
+ "grad_norm": 1.9822254771453949,
931
+ "learning_rate": 7.80229616211014e-07,
932
+ "loss": 1.0626,
933
+ "step": 620
934
+ },
935
+ {
936
+ "epoch": 0.5860840210052514,
937
+ "grad_norm": 2.229570765456582,
938
+ "learning_rate": 7.693872827396111e-07,
939
+ "loss": 0.8915,
940
+ "step": 625
941
+ },
942
+ {
943
+ "epoch": 0.5907726931732933,
944
+ "grad_norm": 2.1333212031989017,
945
+ "learning_rate": 7.58847955430991e-07,
946
+ "loss": 0.9256,
947
+ "step": 630
948
+ },
949
+ {
950
+ "epoch": 0.5954613653413353,
951
+ "grad_norm": 2.3285962864619107,
952
+ "learning_rate": 7.486064958085216e-07,
953
+ "loss": 0.8844,
954
+ "step": 635
955
+ },
956
+ {
957
+ "epoch": 0.6001500375093773,
958
+ "grad_norm": 2.4239895162113654,
959
+ "learning_rate": 7.386577947758049e-07,
960
+ "loss": 1.0284,
961
+ "step": 640
962
+ },
963
+ {
964
+ "epoch": 0.6001500375093773,
965
+ "eval_loss": 0.9292100667953491,
966
+ "eval_runtime": 20.2551,
967
+ "eval_samples_per_second": 9.874,
968
+ "eval_steps_per_second": 2.469,
969
+ "step": 640
970
+ },
971
+ {
972
+ "epoch": 0.6048387096774194,
973
+ "grad_norm": 2.592229243089013,
974
+ "learning_rate": 7.289967727866171e-07,
975
+ "loss": 0.8607,
976
+ "step": 645
977
+ },
978
+ {
979
+ "epoch": 0.6095273818454614,
980
+ "grad_norm": 2.557096114560489,
981
+ "learning_rate": 7.196183800178289e-07,
982
+ "loss": 1.0461,
983
+ "step": 650
984
+ },
985
+ {
986
+ "epoch": 0.6142160540135033,
987
+ "grad_norm": 2.7258364788939278,
988
+ "learning_rate": 7.105175965454019e-07,
989
+ "loss": 0.8923,
990
+ "step": 655
991
+ },
992
+ {
993
+ "epoch": 0.6189047261815454,
994
+ "grad_norm": 2.500897088452433,
995
+ "learning_rate": 7.016894325235454e-07,
996
+ "loss": 0.908,
997
+ "step": 660
998
+ },
999
+ {
1000
+ "epoch": 0.6235933983495874,
1001
+ "grad_norm": 3.4364862124732873,
1002
+ "learning_rate": 6.931289283671353e-07,
1003
+ "loss": 0.8488,
1004
+ "step": 665
1005
+ },
1006
+ {
1007
+ "epoch": 0.6282820705176294,
1008
+ "grad_norm": 2.6544071650230405,
1009
+ "learning_rate": 6.84831154937491e-07,
1010
+ "loss": 0.815,
1011
+ "step": 670
1012
+ },
1013
+ {
1014
+ "epoch": 0.6329707426856714,
1015
+ "grad_norm": 2.9017321933643356,
1016
+ "learning_rate": 6.767912137316187e-07,
1017
+ "loss": 1.0641,
1018
+ "step": 675
1019
+ },
1020
+ {
1021
+ "epoch": 0.6376594148537135,
1022
+ "grad_norm": 3.2461446427637854,
1023
+ "learning_rate": 6.690042370750264e-07,
1024
+ "loss": 0.9388,
1025
+ "step": 680
1026
+ },
1027
+ {
1028
+ "epoch": 0.6423480870217554,
1029
+ "grad_norm": 2.5936370960610398,
1030
+ "learning_rate": 6.614653883182271e-07,
1031
+ "loss": 0.83,
1032
+ "step": 685
1033
+ },
1034
+ {
1035
+ "epoch": 0.6470367591897974,
1036
+ "grad_norm": 3.027451608426823,
1037
+ "learning_rate": 6.541698620370481e-07,
1038
+ "loss": 0.9852,
1039
+ "step": 690
1040
+ },
1041
+ {
1042
+ "epoch": 0.6517254313578394,
1043
+ "grad_norm": 2.8965181885799933,
1044
+ "learning_rate": 6.471128842368711e-07,
1045
+ "loss": 0.8914,
1046
+ "step": 695
1047
+ },
1048
+ {
1049
+ "epoch": 0.6564141035258815,
1050
+ "grad_norm": 2.9030689776107,
1051
+ "learning_rate": 6.402897125609332e-07,
1052
+ "loss": 0.9833,
1053
+ "step": 700
1054
+ },
1055
+ {
1056
+ "epoch": 0.6611027756939235,
1057
+ "grad_norm": 2.6265372108049303,
1058
+ "learning_rate": 6.336956365028259e-07,
1059
+ "loss": 1.0902,
1060
+ "step": 705
1061
+ },
1062
+ {
1063
+ "epoch": 0.6657914478619655,
1064
+ "grad_norm": 2.186738270502514,
1065
+ "learning_rate": 6.273259776233337e-07,
1066
+ "loss": 0.8316,
1067
+ "step": 710
1068
+ },
1069
+ {
1070
+ "epoch": 0.6704801200300075,
1071
+ "grad_norm": 10.89006237530139,
1072
+ "learning_rate": 6.211760897717641e-07,
1073
+ "loss": 1.0283,
1074
+ "step": 715
1075
+ },
1076
+ {
1077
+ "epoch": 0.6751687921980495,
1078
+ "grad_norm": 2.7796220586562344,
1079
+ "learning_rate": 6.152413593119235e-07,
1080
+ "loss": 0.9183,
1081
+ "step": 720
1082
+ },
1083
+ {
1084
+ "epoch": 0.6751687921980495,
1085
+ "eval_loss": 0.9389083385467529,
1086
+ "eval_runtime": 20.3673,
1087
+ "eval_samples_per_second": 9.82,
1088
+ "eval_steps_per_second": 2.455,
1089
+ "step": 720
1090
+ },
1091
+ {
1092
+ "epoch": 0.6798574643660915,
1093
+ "grad_norm": 2.906995236208513,
1094
+ "learning_rate": 6.095172053529076e-07,
1095
+ "loss": 1.034,
1096
+ "step": 725
1097
+ },
1098
+ {
1099
+ "epoch": 0.6845461365341335,
1100
+ "grad_norm": 2.3367410989611943,
1101
+ "learning_rate": 6.039990799848741e-07,
1102
+ "loss": 0.898,
1103
+ "step": 730
1104
+ },
1105
+ {
1106
+ "epoch": 0.6892348087021756,
1107
+ "grad_norm": 2.3266907504669976,
1108
+ "learning_rate": 5.986824685199863e-07,
1109
+ "loss": 0.7855,
1110
+ "step": 735
1111
+ },
1112
+ {
1113
+ "epoch": 0.6939234808702176,
1114
+ "grad_norm": 2.1566038379464842,
1115
+ "learning_rate": 5.935628897387149e-07,
1116
+ "loss": 1.0417,
1117
+ "step": 740
1118
+ },
1119
+ {
1120
+ "epoch": 0.6986121530382595,
1121
+ "grad_norm": 2.3519944231674685,
1122
+ "learning_rate": 5.886358961416999e-07,
1123
+ "loss": 0.9567,
1124
+ "step": 745
1125
+ },
1126
+ {
1127
+ "epoch": 0.7033008252063015,
1128
+ "grad_norm": 2.5749570284624457,
1129
+ "learning_rate": 5.838970742073876e-07,
1130
+ "loss": 0.7935,
1131
+ "step": 750
1132
+ },
1133
+ {
1134
+ "epoch": 0.7079894973743436,
1135
+ "grad_norm": 2.0032057277402844,
1136
+ "learning_rate": 5.793420446556638e-07,
1137
+ "loss": 0.9967,
1138
+ "step": 755
1139
+ },
1140
+ {
1141
+ "epoch": 0.7126781695423856,
1142
+ "grad_norm": 2.8692030696523703,
1143
+ "learning_rate": 5.74966462717722e-07,
1144
+ "loss": 1.0786,
1145
+ "step": 760
1146
+ },
1147
+ {
1148
+ "epoch": 0.7173668417104276,
1149
+ "grad_norm": 2.1378421674086994,
1150
+ "learning_rate": 5.707660184124143e-07,
1151
+ "loss": 1.115,
1152
+ "step": 765
1153
+ },
1154
+ {
1155
+ "epoch": 0.7220555138784697,
1156
+ "grad_norm": 2.344543312884903,
1157
+ "learning_rate": 5.667364368293497e-07,
1158
+ "loss": 0.9502,
1159
+ "step": 770
1160
+ },
1161
+ {
1162
+ "epoch": 0.7267441860465116,
1163
+ "grad_norm": 1.9323329729740275,
1164
+ "learning_rate": 5.6287347841902e-07,
1165
+ "loss": 0.9964,
1166
+ "step": 775
1167
+ },
1168
+ {
1169
+ "epoch": 0.7314328582145536,
1170
+ "grad_norm": 2.948582260278745,
1171
+ "learning_rate": 5.591729392902467e-07,
1172
+ "loss": 1.0077,
1173
+ "step": 780
1174
+ },
1175
+ {
1176
+ "epoch": 0.7361215303825956,
1177
+ "grad_norm": 2.1985962635350442,
1178
+ "learning_rate": 5.556306515152638e-07,
1179
+ "loss": 0.7058,
1180
+ "step": 785
1181
+ },
1182
+ {
1183
+ "epoch": 0.7408102025506377,
1184
+ "grad_norm": 4.099442910663203,
1185
+ "learning_rate": 5.522424834427688e-07,
1186
+ "loss": 0.856,
1187
+ "step": 790
1188
+ },
1189
+ {
1190
+ "epoch": 0.7454988747186797,
1191
+ "grad_norm": 2.3260610179486787,
1192
+ "learning_rate": 5.490043400192936e-07,
1193
+ "loss": 0.9852,
1194
+ "step": 795
1195
+ },
1196
+ {
1197
+ "epoch": 0.7501875468867217,
1198
+ "grad_norm": 2.2249724666107693,
1199
+ "learning_rate": 5.459121631192727e-07,
1200
+ "loss": 0.9897,
1201
+ "step": 800
1202
+ },
1203
+ {
1204
+ "epoch": 0.7501875468867217,
1205
+ "eval_loss": 0.9337027072906494,
1206
+ "eval_runtime": 20.543,
1207
+ "eval_samples_per_second": 9.736,
1208
+ "eval_steps_per_second": 2.434,
1209
+ "step": 800
1210
+ },
1211
+ {
1212
+ "epoch": 0.7548762190547637,
1213
+ "grad_norm": 2.3238862226735244,
1214
+ "learning_rate": 5.429619318842062e-07,
1215
+ "loss": 0.9693,
1216
+ "step": 805
1217
+ },
1218
+ {
1219
+ "epoch": 0.7595648912228057,
1220
+ "grad_norm": 2.5983979780721005,
1221
+ "learning_rate": 5.401496630713439e-07,
1222
+ "loss": 0.9363,
1223
+ "step": 810
1224
+ },
1225
+ {
1226
+ "epoch": 0.7642535633908477,
1227
+ "grad_norm": 3.203392479652258,
1228
+ "learning_rate": 5.374714114123462e-07,
1229
+ "loss": 0.9007,
1230
+ "step": 815
1231
+ },
1232
+ {
1233
+ "epoch": 0.7689422355588897,
1234
+ "grad_norm": 3.0116714434819523,
1235
+ "learning_rate": 5.34923269982403e-07,
1236
+ "loss": 0.8538,
1237
+ "step": 820
1238
+ },
1239
+ {
1240
+ "epoch": 0.7736309077269318,
1241
+ "grad_norm": 2.735036803752382,
1242
+ "learning_rate": 5.325013705803326e-07,
1243
+ "loss": 0.9429,
1244
+ "step": 825
1245
+ },
1246
+ {
1247
+ "epoch": 0.7783195798949737,
1248
+ "grad_norm": 2.7192900196409693,
1249
+ "learning_rate": 5.302018841202155e-07,
1250
+ "loss": 0.862,
1251
+ "step": 830
1252
+ },
1253
+ {
1254
+ "epoch": 0.7830082520630157,
1255
+ "grad_norm": 3.310126483594716,
1256
+ "learning_rate": 5.28021021035156e-07,
1257
+ "loss": 0.8743,
1258
+ "step": 835
1259
+ },
1260
+ {
1261
+ "epoch": 0.7876969242310577,
1262
+ "grad_norm": 3.5788231445835623,
1263
+ "learning_rate": 5.25955031693814e-07,
1264
+ "loss": 0.8728,
1265
+ "step": 840
1266
+ },
1267
+ {
1268
+ "epoch": 0.7923855963990998,
1269
+ "grad_norm": 2.3031948200618784,
1270
+ "learning_rate": 5.240002068303935e-07,
1271
+ "loss": 0.9804,
1272
+ "step": 845
1273
+ },
1274
+ {
1275
+ "epoch": 0.7970742685671418,
1276
+ "grad_norm": 2.27441145876717,
1277
+ "learning_rate": 5.22152877988829e-07,
1278
+ "loss": 0.9592,
1279
+ "step": 850
1280
+ },
1281
+ {
1282
+ "epoch": 0.8017629407351838,
1283
+ "grad_norm": 2.399760421263353,
1284
+ "learning_rate": 5.204094179819663e-07,
1285
+ "loss": 1.1113,
1286
+ "step": 855
1287
+ },
1288
+ {
1289
+ "epoch": 0.8064516129032258,
1290
+ "grad_norm": 2.786650153581553,
1291
+ "learning_rate": 5.187662413666055e-07,
1292
+ "loss": 0.896,
1293
+ "step": 860
1294
+ },
1295
+ {
1296
+ "epoch": 0.8111402850712678,
1297
+ "grad_norm": 2.3375248528715886,
1298
+ "learning_rate": 5.137741469209312e-07,
1299
+ "loss": 0.9484,
1300
+ "step": 865
1301
+ },
1302
+ {
1303
+ "epoch": 0.8158289572393098,
1304
+ "grad_norm": 2.107239092918484,
1305
+ "learning_rate": 5.034812576940423e-07,
1306
+ "loss": 0.9107,
1307
+ "step": 870
1308
+ },
1309
+ {
1310
+ "epoch": 0.8205176294073518,
1311
+ "grad_norm": 3.2120652639502136,
1312
+ "learning_rate": 4.931883684671535e-07,
1313
+ "loss": 1.0274,
1314
+ "step": 875
1315
+ },
1316
+ {
1317
+ "epoch": 0.8252063015753939,
1318
+ "grad_norm": 2.1470972788703353,
1319
+ "learning_rate": 4.828954792402647e-07,
1320
+ "loss": 1.0209,
1321
+ "step": 880
1322
+ },
1323
+ {
1324
+ "epoch": 0.8252063015753939,
1325
+ "eval_loss": 0.93896484375,
1326
+ "eval_runtime": 20.3111,
1327
+ "eval_samples_per_second": 9.847,
1328
+ "eval_steps_per_second": 2.462,
1329
+ "step": 880
1330
+ },
1331
+ {
1332
+ "epoch": 0.8298949737434359,
1333
+ "grad_norm": 2.0424483656754724,
1334
+ "learning_rate": 4.7260259001337577e-07,
1335
+ "loss": 0.9472,
1336
+ "step": 885
1337
+ },
1338
+ {
1339
+ "epoch": 0.8345836459114778,
1340
+ "grad_norm": 2.5878530403298434,
1341
+ "learning_rate": 4.6230970078648696e-07,
1342
+ "loss": 0.9458,
1343
+ "step": 890
1344
+ },
1345
+ {
1346
+ "epoch": 0.8392723180795199,
1347
+ "grad_norm": 2.497909901428293,
1348
+ "learning_rate": 4.5201681155959816e-07,
1349
+ "loss": 0.7928,
1350
+ "step": 895
1351
+ },
1352
+ {
1353
+ "epoch": 0.8439609902475619,
1354
+ "grad_norm": 2.1704945555054462,
1355
+ "learning_rate": 4.417239223327093e-07,
1356
+ "loss": 0.8992,
1357
+ "step": 900
1358
+ },
1359
+ {
1360
+ "epoch": 0.8486496624156039,
1361
+ "grad_norm": 1.9807380776486385,
1362
+ "learning_rate": 4.314310331058205e-07,
1363
+ "loss": 1.0681,
1364
+ "step": 905
1365
+ },
1366
+ {
1367
+ "epoch": 0.8533383345836459,
1368
+ "grad_norm": 2.2057348642236625,
1369
+ "learning_rate": 4.2113814387893164e-07,
1370
+ "loss": 1.057,
1371
+ "step": 910
1372
+ },
1373
+ {
1374
+ "epoch": 0.858027006751688,
1375
+ "grad_norm": 2.434197081864654,
1376
+ "learning_rate": 4.108452546520428e-07,
1377
+ "loss": 0.9609,
1378
+ "step": 915
1379
+ },
1380
+ {
1381
+ "epoch": 0.8627156789197299,
1382
+ "grad_norm": 1.641371038860168,
1383
+ "learning_rate": 4.00552365425154e-07,
1384
+ "loss": 0.9775,
1385
+ "step": 920
1386
+ },
1387
+ {
1388
+ "epoch": 0.8674043510877719,
1389
+ "grad_norm": 2.207714901698735,
1390
+ "learning_rate": 3.9025947619826517e-07,
1391
+ "loss": 0.7712,
1392
+ "step": 925
1393
+ },
1394
+ {
1395
+ "epoch": 0.872093023255814,
1396
+ "grad_norm": 2.011772170698376,
1397
+ "learning_rate": 3.7996658697137626e-07,
1398
+ "loss": 0.9408,
1399
+ "step": 930
1400
+ },
1401
+ {
1402
+ "epoch": 0.876781695423856,
1403
+ "grad_norm": 2.5252414435017747,
1404
+ "learning_rate": 3.6967369774448745e-07,
1405
+ "loss": 0.8363,
1406
+ "step": 935
1407
+ },
1408
+ {
1409
+ "epoch": 0.881470367591898,
1410
+ "grad_norm": 2.3365392350792287,
1411
+ "learning_rate": 3.5938080851759865e-07,
1412
+ "loss": 0.9977,
1413
+ "step": 940
1414
+ },
1415
+ {
1416
+ "epoch": 0.88615903975994,
1417
+ "grad_norm": 2.3882731527862346,
1418
+ "learning_rate": 3.490879192907099e-07,
1419
+ "loss": 0.9326,
1420
+ "step": 945
1421
+ },
1422
+ {
1423
+ "epoch": 0.890847711927982,
1424
+ "grad_norm": 2.180049227770247,
1425
+ "learning_rate": 3.38795030063821e-07,
1426
+ "loss": 0.9909,
1427
+ "step": 950
1428
+ },
1429
+ {
1430
+ "epoch": 0.895536384096024,
1431
+ "grad_norm": 2.4546116544168965,
1432
+ "learning_rate": 3.285021408369321e-07,
1433
+ "loss": 0.8207,
1434
+ "step": 955
1435
+ },
1436
+ {
1437
+ "epoch": 0.900225056264066,
1438
+ "grad_norm": 3.062713753398875,
1439
+ "learning_rate": 3.1820925161004326e-07,
1440
+ "loss": 0.9118,
1441
+ "step": 960
1442
+ },
1443
+ {
1444
+ "epoch": 0.900225056264066,
1445
+ "eval_loss": 0.9374144077301025,
1446
+ "eval_runtime": 20.3928,
1447
+ "eval_samples_per_second": 9.807,
1448
+ "eval_steps_per_second": 2.452,
1449
+ "step": 960
1450
+ },
1451
+ {
1452
+ "epoch": 0.904913728432108,
1453
+ "grad_norm": 2.9305276635268633,
1454
+ "learning_rate": 3.0791636238315446e-07,
1455
+ "loss": 0.8345,
1456
+ "step": 965
1457
+ },
1458
+ {
1459
+ "epoch": 0.9096024006001501,
1460
+ "grad_norm": 2.9508689565092046,
1461
+ "learning_rate": 2.9762347315626565e-07,
1462
+ "loss": 0.7999,
1463
+ "step": 970
1464
+ },
1465
+ {
1466
+ "epoch": 0.9142910727681921,
1467
+ "grad_norm": 3.9095096230897237,
1468
+ "learning_rate": 2.873305839293768e-07,
1469
+ "loss": 0.8812,
1470
+ "step": 975
1471
+ },
1472
+ {
1473
+ "epoch": 0.918979744936234,
1474
+ "grad_norm": 2.954610613967905,
1475
+ "learning_rate": 2.7703769470248794e-07,
1476
+ "loss": 0.9124,
1477
+ "step": 980
1478
+ },
1479
+ {
1480
+ "epoch": 0.923668417104276,
1481
+ "grad_norm": 2.3909984231204446,
1482
+ "learning_rate": 2.6674480547559913e-07,
1483
+ "loss": 0.8418,
1484
+ "step": 985
1485
+ },
1486
+ {
1487
+ "epoch": 0.9283570892723181,
1488
+ "grad_norm": 2.407545423074375,
1489
+ "learning_rate": 2.5645191624871027e-07,
1490
+ "loss": 0.9194,
1491
+ "step": 990
1492
+ },
1493
+ {
1494
+ "epoch": 0.9330457614403601,
1495
+ "grad_norm": 2.868767477778071,
1496
+ "learning_rate": 2.4615902702182147e-07,
1497
+ "loss": 0.9541,
1498
+ "step": 995
1499
+ },
1500
+ {
1501
+ "epoch": 0.9377344336084021,
1502
+ "grad_norm": 2.1293635803564066,
1503
+ "learning_rate": 2.3586613779493258e-07,
1504
+ "loss": 0.8087,
1505
+ "step": 1000
1506
+ },
1507
+ {
1508
+ "epoch": 0.9424231057764441,
1509
+ "grad_norm": 2.0557494272602264,
1510
+ "learning_rate": 2.255732485680438e-07,
1511
+ "loss": 0.9081,
1512
+ "step": 1005
1513
+ },
1514
+ {
1515
+ "epoch": 0.9471117779444861,
1516
+ "grad_norm": 2.1864930223853056,
1517
+ "learning_rate": 2.1528035934115495e-07,
1518
+ "loss": 0.859,
1519
+ "step": 1010
1520
+ },
1521
+ {
1522
+ "epoch": 0.9518004501125281,
1523
+ "grad_norm": 2.05122621453041,
1524
+ "learning_rate": 2.0498747011426614e-07,
1525
+ "loss": 0.8725,
1526
+ "step": 1015
1527
+ },
1528
+ {
1529
+ "epoch": 0.9564891222805701,
1530
+ "grad_norm": 3.2726273841892923,
1531
+ "learning_rate": 1.9469458088737728e-07,
1532
+ "loss": 0.7904,
1533
+ "step": 1020
1534
+ },
1535
+ {
1536
+ "epoch": 0.9611777944486122,
1537
+ "grad_norm": 2.259537550017741,
1538
+ "learning_rate": 1.8440169166048842e-07,
1539
+ "loss": 1.0669,
1540
+ "step": 1025
1541
+ },
1542
+ {
1543
+ "epoch": 0.9658664666166542,
1544
+ "grad_norm": 2.406310953068651,
1545
+ "learning_rate": 1.7410880243359964e-07,
1546
+ "loss": 0.8434,
1547
+ "step": 1030
1548
+ },
1549
+ {
1550
+ "epoch": 0.9705551387846961,
1551
+ "grad_norm": 1.9259292180053644,
1552
+ "learning_rate": 1.6381591320671076e-07,
1553
+ "loss": 1.0234,
1554
+ "step": 1035
1555
+ },
1556
+ {
1557
+ "epoch": 0.9752438109527382,
1558
+ "grad_norm": 2.0241322819114504,
1559
+ "learning_rate": 1.5352302397982195e-07,
1560
+ "loss": 0.9077,
1561
+ "step": 1040
1562
+ },
1563
+ {
1564
+ "epoch": 0.9752438109527382,
1565
+ "eval_loss": 0.9355975389480591,
1566
+ "eval_runtime": 20.3082,
1567
+ "eval_samples_per_second": 9.848,
1568
+ "eval_steps_per_second": 2.462,
1569
+ "step": 1040
1570
+ },
1571
+ {
1572
+ "epoch": 0.9799324831207802,
1573
+ "grad_norm": 1.9205593868577922,
1574
+ "learning_rate": 1.432301347529331e-07,
1575
+ "loss": 0.7646,
1576
+ "step": 1045
1577
+ },
1578
+ {
1579
+ "epoch": 0.9846211552888222,
1580
+ "grad_norm": 3.2012232274694834,
1581
+ "learning_rate": 1.329372455260443e-07,
1582
+ "loss": 0.8593,
1583
+ "step": 1050
1584
+ },
1585
+ {
1586
+ "epoch": 0.9893098274568642,
1587
+ "grad_norm": 2.3484948634638205,
1588
+ "learning_rate": 1.2264435629915543e-07,
1589
+ "loss": 1.0972,
1590
+ "step": 1055
1591
+ },
1592
+ {
1593
+ "epoch": 0.9939984996249063,
1594
+ "grad_norm": 2.7160625332606685,
1595
+ "learning_rate": 1.1235146707226661e-07,
1596
+ "loss": 0.7634,
1597
+ "step": 1060
1598
+ },
1599
+ {
1600
+ "epoch": 0.9986871717929482,
1601
+ "grad_norm": 2.3964216948732173,
1602
+ "learning_rate": 1.0205857784537777e-07,
1603
+ "loss": 0.9783,
1604
+ "step": 1065
1605
+ },
1606
+ {
1607
+ "epoch": 0.9996249062265566,
1608
+ "step": 1066,
1609
+ "total_flos": 1.5465201990107136e+17,
1610
+ "train_loss": 0.9640034305221815,
1611
+ "train_runtime": 13404.8173,
1612
+ "train_samples_per_second": 2.386,
1613
+ "train_steps_per_second": 0.08
1614
+ }
1615
+ ],
1616
+ "logging_steps": 5,
1617
+ "max_steps": 1066,
1618
+ "num_input_tokens_seen": 0,
1619
+ "num_train_epochs": 1,
1620
+ "save_steps": 1066,
1621
+ "stateful_callbacks": {
1622
+ "TrainerControl": {
1623
+ "args": {
1624
+ "should_epoch_stop": false,
1625
+ "should_evaluate": false,
1626
+ "should_log": false,
1627
+ "should_save": true,
1628
+ "should_training_stop": true
1629
+ },
1630
+ "attributes": {}
1631
+ }
1632
+ },
1633
+ "total_flos": 1.5465201990107136e+17,
1634
+ "train_batch_size": 3,
1635
+ "trial_name": null,
1636
+ "trial_params": null
1637
+ }