BrownEnergy commited on
Commit
b860c19
·
verified ·
1 Parent(s): 1e1263d

Upload folder using huggingface_hub

Browse files
Files changed (8) hide show
  1. README.md +95 -0
  2. metadata.json +8 -0
  3. optimizer.pt +3 -0
  4. pytorch_model.bin +3 -0
  5. rng_state.pth +3 -0
  6. scheduler.pt +3 -0
  7. trainer_state.json +787 -0
  8. training_args.bin +3 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: google/vit-base-patch16-224
4
+ tags:
5
+ - Image Regression
6
+ datasets:
7
+ - "BrownEnergy/secchi_depth"
8
+ metrics:
9
+ - accuracy
10
+ model-index:
11
+ - name: "sd_depth_regression"
12
+ results: []
13
+ ---
14
+
15
+ # sd_depth_regression
16
+ ## Image Regression Model
17
+
18
+ This model was trained with [Image Regression Model Trainer](https://github.com/TonyAssi/ImageRegression/tree/main). It takes an image as input and outputs a float value.
19
+
20
+ ```python
21
+ from ImageRegression import predict
22
+ predict(repo_id='BrownEnergy/sd_depth_regression',image_path='image.jpg')
23
+ ```
24
+
25
+ ---
26
+
27
+ ## Dataset
28
+ Dataset: BrownEnergy/secchi_depth\
29
+ Value Column: 'sd_depth'\
30
+ Train Test Split: 0.2
31
+
32
+ ---
33
+
34
+ ## Training
35
+ Base Model: [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224)\
36
+ Epochs: 10\
37
+ Learning Rate: 0.0001
38
+
39
+ ---
40
+
41
+ ## Usage
42
+
43
+ ### Download
44
+ ```bash
45
+ git clone https://github.com/TonyAssi/ImageRegression.git
46
+ cd ImageRegression
47
+ ```
48
+
49
+ ### Installation
50
+ ```bash
51
+ pip install -r requirements.txt
52
+ ```
53
+
54
+ ### Import
55
+ ```python
56
+ from ImageRegression import train_model, upload_model, predict
57
+ ```
58
+
59
+ ### Inference (Prediction)
60
+ - **repo_id** 🤗 repo id of the model
61
+ - **image_path** path to image
62
+ ```python
63
+ predict(repo_id='BrownEnergy/sd_depth_regression',
64
+ image_path='image.jpg')
65
+ ```
66
+ The first time this function is called it'll download the safetensor model. Subsequent function calls will run faster.
67
+
68
+ ### Train Model
69
+ - **dataset_id** 🤗 dataset id
70
+ - **value_column_name** column name of prediction values in dataset
71
+ - **test_split** test split of the train/test split
72
+ - **output_dir** the directory where the checkpoints will be saved
73
+ - **num_train_epochs** training epochs
74
+ - **learning_rate** learning rate
75
+ ```python
76
+ train_model(dataset_id='BrownEnergy/secchi_depth',
77
+ value_column_name='sd_depth',
78
+ test_split=0.2,
79
+ output_dir='./results',
80
+ num_train_epochs=10,
81
+ learning_rate=0.0001)
82
+
83
+ ```
84
+ The trainer will save the checkpoints in the output_dir location. The model.safetensors are the trained weights you'll use for inference (predicton).
85
+
86
+ ### Upload Model
87
+ This function will upload your model to the 🤗 Hub.
88
+ - **model_id** the name of the model id
89
+ - **token** go [here](https://huggingface.co/settings/tokens) to create a new 🤗 token
90
+ - **checkpoint_dir** checkpoint folder that will be uploaded
91
+ ```python
92
+ upload_model(model_id='sd_depth_regression',
93
+ token='YOUR_HF_TOKEN',
94
+ checkpoint_dir='./results/checkpoint-940')
95
+ ```
metadata.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dataset_id": "BrownEnergy/secchi_depth",
3
+ "value_column_name": "sd_depth",
4
+ "test_split": 0.2,
5
+ "num_train_epochs": 10,
6
+ "learning_rate": 0.0001,
7
+ "max_value": 77.0
8
+ }
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:03e3e95f99f3e03fe5e0c197c2e4f546b516aa38cbd2949822e45963121edab3
3
+ size 686507205
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4c42a4f164899951641c143cac75824df22508a3c780ca73a0a12409cd35159
3
+ size 345639733
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e69db2ebd3dbe75c8467b788e5787cf93796fa78adc7a0c39fc13316b0348a38
3
+ size 13553
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66bc07f61d019d5f0708e8d8d34df8003c421ef1510e4f63cb182d9eb2229c5c
3
+ size 627
trainer_state.json ADDED
@@ -0,0 +1,787 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 10.0,
5
+ "global_step": 1150,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.09,
12
+ "learning_rate": 9.91304347826087e-05,
13
+ "loss": 0.1114,
14
+ "step": 10
15
+ },
16
+ {
17
+ "epoch": 0.17,
18
+ "learning_rate": 9.82608695652174e-05,
19
+ "loss": 0.0247,
20
+ "step": 20
21
+ },
22
+ {
23
+ "epoch": 0.26,
24
+ "learning_rate": 9.739130434782609e-05,
25
+ "loss": 0.0153,
26
+ "step": 30
27
+ },
28
+ {
29
+ "epoch": 0.35,
30
+ "learning_rate": 9.652173913043479e-05,
31
+ "loss": 0.0139,
32
+ "step": 40
33
+ },
34
+ {
35
+ "epoch": 0.43,
36
+ "learning_rate": 9.565217391304348e-05,
37
+ "loss": 0.0117,
38
+ "step": 50
39
+ },
40
+ {
41
+ "epoch": 0.52,
42
+ "learning_rate": 9.478260869565218e-05,
43
+ "loss": 0.0029,
44
+ "step": 60
45
+ },
46
+ {
47
+ "epoch": 0.61,
48
+ "learning_rate": 9.391304347826087e-05,
49
+ "loss": 0.0031,
50
+ "step": 70
51
+ },
52
+ {
53
+ "epoch": 0.7,
54
+ "learning_rate": 9.304347826086957e-05,
55
+ "loss": 0.0175,
56
+ "step": 80
57
+ },
58
+ {
59
+ "epoch": 0.78,
60
+ "learning_rate": 9.217391304347827e-05,
61
+ "loss": 0.0141,
62
+ "step": 90
63
+ },
64
+ {
65
+ "epoch": 0.87,
66
+ "learning_rate": 9.130434782608696e-05,
67
+ "loss": 0.0073,
68
+ "step": 100
69
+ },
70
+ {
71
+ "epoch": 0.96,
72
+ "learning_rate": 9.043478260869566e-05,
73
+ "loss": 0.0067,
74
+ "step": 110
75
+ },
76
+ {
77
+ "epoch": 1.0,
78
+ "eval_loss": 0.005401695612818003,
79
+ "eval_mse": 0.005401696544140577,
80
+ "eval_runtime": 184.1257,
81
+ "eval_samples_per_second": 1.249,
82
+ "eval_steps_per_second": 0.158,
83
+ "step": 115
84
+ },
85
+ {
86
+ "epoch": 1.04,
87
+ "learning_rate": 8.956521739130435e-05,
88
+ "loss": 0.0041,
89
+ "step": 120
90
+ },
91
+ {
92
+ "epoch": 1.13,
93
+ "learning_rate": 8.869565217391305e-05,
94
+ "loss": 0.009,
95
+ "step": 130
96
+ },
97
+ {
98
+ "epoch": 1.22,
99
+ "learning_rate": 8.782608695652174e-05,
100
+ "loss": 0.0031,
101
+ "step": 140
102
+ },
103
+ {
104
+ "epoch": 1.3,
105
+ "learning_rate": 8.695652173913044e-05,
106
+ "loss": 0.0133,
107
+ "step": 150
108
+ },
109
+ {
110
+ "epoch": 1.39,
111
+ "learning_rate": 8.608695652173914e-05,
112
+ "loss": 0.0022,
113
+ "step": 160
114
+ },
115
+ {
116
+ "epoch": 1.48,
117
+ "learning_rate": 8.521739130434783e-05,
118
+ "loss": 0.0019,
119
+ "step": 170
120
+ },
121
+ {
122
+ "epoch": 1.57,
123
+ "learning_rate": 8.434782608695653e-05,
124
+ "loss": 0.0023,
125
+ "step": 180
126
+ },
127
+ {
128
+ "epoch": 1.65,
129
+ "learning_rate": 8.347826086956521e-05,
130
+ "loss": 0.0018,
131
+ "step": 190
132
+ },
133
+ {
134
+ "epoch": 1.74,
135
+ "learning_rate": 8.260869565217392e-05,
136
+ "loss": 0.0018,
137
+ "step": 200
138
+ },
139
+ {
140
+ "epoch": 1.83,
141
+ "learning_rate": 8.173913043478262e-05,
142
+ "loss": 0.0036,
143
+ "step": 210
144
+ },
145
+ {
146
+ "epoch": 1.91,
147
+ "learning_rate": 8.086956521739131e-05,
148
+ "loss": 0.0067,
149
+ "step": 220
150
+ },
151
+ {
152
+ "epoch": 2.0,
153
+ "learning_rate": 8e-05,
154
+ "loss": 0.0079,
155
+ "step": 230
156
+ },
157
+ {
158
+ "epoch": 2.0,
159
+ "eval_loss": 0.006907076574862003,
160
+ "eval_mse": 0.006907076574862003,
161
+ "eval_runtime": 2268.8659,
162
+ "eval_samples_per_second": 0.101,
163
+ "eval_steps_per_second": 0.013,
164
+ "step": 230
165
+ },
166
+ {
167
+ "epoch": 2.09,
168
+ "learning_rate": 7.91304347826087e-05,
169
+ "loss": 0.0031,
170
+ "step": 240
171
+ },
172
+ {
173
+ "epoch": 2.17,
174
+ "learning_rate": 7.82608695652174e-05,
175
+ "loss": 0.0066,
176
+ "step": 250
177
+ },
178
+ {
179
+ "epoch": 2.26,
180
+ "learning_rate": 7.73913043478261e-05,
181
+ "loss": 0.0032,
182
+ "step": 260
183
+ },
184
+ {
185
+ "epoch": 2.35,
186
+ "learning_rate": 7.652173913043479e-05,
187
+ "loss": 0.0028,
188
+ "step": 270
189
+ },
190
+ {
191
+ "epoch": 2.43,
192
+ "learning_rate": 7.565217391304347e-05,
193
+ "loss": 0.0067,
194
+ "step": 280
195
+ },
196
+ {
197
+ "epoch": 2.52,
198
+ "learning_rate": 7.478260869565218e-05,
199
+ "loss": 0.0029,
200
+ "step": 290
201
+ },
202
+ {
203
+ "epoch": 2.61,
204
+ "learning_rate": 7.391304347826086e-05,
205
+ "loss": 0.0017,
206
+ "step": 300
207
+ },
208
+ {
209
+ "epoch": 2.7,
210
+ "learning_rate": 7.304347826086957e-05,
211
+ "loss": 0.0019,
212
+ "step": 310
213
+ },
214
+ {
215
+ "epoch": 2.78,
216
+ "learning_rate": 7.217391304347827e-05,
217
+ "loss": 0.013,
218
+ "step": 320
219
+ },
220
+ {
221
+ "epoch": 2.87,
222
+ "learning_rate": 7.130434782608696e-05,
223
+ "loss": 0.0048,
224
+ "step": 330
225
+ },
226
+ {
227
+ "epoch": 2.96,
228
+ "learning_rate": 7.043478260869566e-05,
229
+ "loss": 0.0033,
230
+ "step": 340
231
+ },
232
+ {
233
+ "epoch": 3.0,
234
+ "eval_loss": 0.005840361583977938,
235
+ "eval_mse": 0.005840362515300512,
236
+ "eval_runtime": 179.4443,
237
+ "eval_samples_per_second": 1.282,
238
+ "eval_steps_per_second": 0.162,
239
+ "step": 345
240
+ },
241
+ {
242
+ "epoch": 3.04,
243
+ "learning_rate": 6.956521739130436e-05,
244
+ "loss": 0.0062,
245
+ "step": 350
246
+ },
247
+ {
248
+ "epoch": 3.13,
249
+ "learning_rate": 6.869565217391305e-05,
250
+ "loss": 0.0019,
251
+ "step": 360
252
+ },
253
+ {
254
+ "epoch": 3.22,
255
+ "learning_rate": 6.782608695652173e-05,
256
+ "loss": 0.008,
257
+ "step": 370
258
+ },
259
+ {
260
+ "epoch": 3.3,
261
+ "learning_rate": 6.695652173913044e-05,
262
+ "loss": 0.0033,
263
+ "step": 380
264
+ },
265
+ {
266
+ "epoch": 3.39,
267
+ "learning_rate": 6.608695652173912e-05,
268
+ "loss": 0.0034,
269
+ "step": 390
270
+ },
271
+ {
272
+ "epoch": 3.48,
273
+ "learning_rate": 6.521739130434783e-05,
274
+ "loss": 0.002,
275
+ "step": 400
276
+ },
277
+ {
278
+ "epoch": 3.57,
279
+ "learning_rate": 6.434782608695652e-05,
280
+ "loss": 0.0012,
281
+ "step": 410
282
+ },
283
+ {
284
+ "epoch": 3.65,
285
+ "learning_rate": 6.347826086956523e-05,
286
+ "loss": 0.0166,
287
+ "step": 420
288
+ },
289
+ {
290
+ "epoch": 3.74,
291
+ "learning_rate": 6.260869565217392e-05,
292
+ "loss": 0.0039,
293
+ "step": 430
294
+ },
295
+ {
296
+ "epoch": 3.83,
297
+ "learning_rate": 6.173913043478262e-05,
298
+ "loss": 0.0016,
299
+ "step": 440
300
+ },
301
+ {
302
+ "epoch": 3.91,
303
+ "learning_rate": 6.086956521739131e-05,
304
+ "loss": 0.0016,
305
+ "step": 450
306
+ },
307
+ {
308
+ "epoch": 4.0,
309
+ "learning_rate": 6e-05,
310
+ "loss": 0.0011,
311
+ "step": 460
312
+ },
313
+ {
314
+ "epoch": 4.0,
315
+ "eval_loss": 0.005455708596855402,
316
+ "eval_mse": 0.005455708596855402,
317
+ "eval_runtime": 69.7545,
318
+ "eval_samples_per_second": 3.297,
319
+ "eval_steps_per_second": 0.416,
320
+ "step": 460
321
+ },
322
+ {
323
+ "epoch": 4.09,
324
+ "learning_rate": 5.9130434782608704e-05,
325
+ "loss": 0.003,
326
+ "step": 470
327
+ },
328
+ {
329
+ "epoch": 4.17,
330
+ "learning_rate": 5.826086956521739e-05,
331
+ "loss": 0.001,
332
+ "step": 480
333
+ },
334
+ {
335
+ "epoch": 4.26,
336
+ "learning_rate": 5.739130434782609e-05,
337
+ "loss": 0.0047,
338
+ "step": 490
339
+ },
340
+ {
341
+ "epoch": 4.35,
342
+ "learning_rate": 5.652173913043478e-05,
343
+ "loss": 0.0019,
344
+ "step": 500
345
+ },
346
+ {
347
+ "epoch": 4.43,
348
+ "learning_rate": 5.565217391304348e-05,
349
+ "loss": 0.0019,
350
+ "step": 510
351
+ },
352
+ {
353
+ "epoch": 4.52,
354
+ "learning_rate": 5.478260869565217e-05,
355
+ "loss": 0.0116,
356
+ "step": 520
357
+ },
358
+ {
359
+ "epoch": 4.61,
360
+ "learning_rate": 5.391304347826087e-05,
361
+ "loss": 0.0073,
362
+ "step": 530
363
+ },
364
+ {
365
+ "epoch": 4.7,
366
+ "learning_rate": 5.3043478260869574e-05,
367
+ "loss": 0.0022,
368
+ "step": 540
369
+ },
370
+ {
371
+ "epoch": 4.78,
372
+ "learning_rate": 5.217391304347826e-05,
373
+ "loss": 0.0012,
374
+ "step": 550
375
+ },
376
+ {
377
+ "epoch": 4.87,
378
+ "learning_rate": 5.1304347826086966e-05,
379
+ "loss": 0.0082,
380
+ "step": 560
381
+ },
382
+ {
383
+ "epoch": 4.96,
384
+ "learning_rate": 5.0434782608695655e-05,
385
+ "loss": 0.003,
386
+ "step": 570
387
+ },
388
+ {
389
+ "epoch": 5.0,
390
+ "eval_loss": 0.008183675818145275,
391
+ "eval_mse": 0.008183675818145275,
392
+ "eval_runtime": 76.479,
393
+ "eval_samples_per_second": 3.007,
394
+ "eval_steps_per_second": 0.379,
395
+ "step": 575
396
+ },
397
+ {
398
+ "epoch": 5.04,
399
+ "learning_rate": 4.956521739130435e-05,
400
+ "loss": 0.0178,
401
+ "step": 580
402
+ },
403
+ {
404
+ "epoch": 5.13,
405
+ "learning_rate": 4.8695652173913046e-05,
406
+ "loss": 0.0035,
407
+ "step": 590
408
+ },
409
+ {
410
+ "epoch": 5.22,
411
+ "learning_rate": 4.782608695652174e-05,
412
+ "loss": 0.0048,
413
+ "step": 600
414
+ },
415
+ {
416
+ "epoch": 5.3,
417
+ "learning_rate": 4.695652173913044e-05,
418
+ "loss": 0.0013,
419
+ "step": 610
420
+ },
421
+ {
422
+ "epoch": 5.39,
423
+ "learning_rate": 4.608695652173913e-05,
424
+ "loss": 0.0058,
425
+ "step": 620
426
+ },
427
+ {
428
+ "epoch": 5.48,
429
+ "learning_rate": 4.521739130434783e-05,
430
+ "loss": 0.006,
431
+ "step": 630
432
+ },
433
+ {
434
+ "epoch": 5.57,
435
+ "learning_rate": 4.4347826086956525e-05,
436
+ "loss": 0.0053,
437
+ "step": 640
438
+ },
439
+ {
440
+ "epoch": 5.65,
441
+ "learning_rate": 4.347826086956522e-05,
442
+ "loss": 0.0011,
443
+ "step": 650
444
+ },
445
+ {
446
+ "epoch": 5.74,
447
+ "learning_rate": 4.2608695652173916e-05,
448
+ "loss": 0.0012,
449
+ "step": 660
450
+ },
451
+ {
452
+ "epoch": 5.83,
453
+ "learning_rate": 4.1739130434782605e-05,
454
+ "loss": 0.0011,
455
+ "step": 670
456
+ },
457
+ {
458
+ "epoch": 5.91,
459
+ "learning_rate": 4.086956521739131e-05,
460
+ "loss": 0.0017,
461
+ "step": 680
462
+ },
463
+ {
464
+ "epoch": 6.0,
465
+ "learning_rate": 4e-05,
466
+ "loss": 0.0012,
467
+ "step": 690
468
+ },
469
+ {
470
+ "epoch": 6.0,
471
+ "eval_loss": 0.00548972561955452,
472
+ "eval_mse": 0.00548972561955452,
473
+ "eval_runtime": 68.4873,
474
+ "eval_samples_per_second": 3.358,
475
+ "eval_steps_per_second": 0.423,
476
+ "step": 690
477
+ },
478
+ {
479
+ "epoch": 6.09,
480
+ "learning_rate": 3.91304347826087e-05,
481
+ "loss": 0.0025,
482
+ "step": 700
483
+ },
484
+ {
485
+ "epoch": 6.17,
486
+ "learning_rate": 3.8260869565217395e-05,
487
+ "loss": 0.0024,
488
+ "step": 710
489
+ },
490
+ {
491
+ "epoch": 6.26,
492
+ "learning_rate": 3.739130434782609e-05,
493
+ "loss": 0.0016,
494
+ "step": 720
495
+ },
496
+ {
497
+ "epoch": 6.35,
498
+ "learning_rate": 3.6521739130434786e-05,
499
+ "loss": 0.0051,
500
+ "step": 730
501
+ },
502
+ {
503
+ "epoch": 6.43,
504
+ "learning_rate": 3.565217391304348e-05,
505
+ "loss": 0.0046,
506
+ "step": 740
507
+ },
508
+ {
509
+ "epoch": 6.52,
510
+ "learning_rate": 3.478260869565218e-05,
511
+ "loss": 0.0031,
512
+ "step": 750
513
+ },
514
+ {
515
+ "epoch": 6.61,
516
+ "learning_rate": 3.3913043478260867e-05,
517
+ "loss": 0.0012,
518
+ "step": 760
519
+ },
520
+ {
521
+ "epoch": 6.7,
522
+ "learning_rate": 3.304347826086956e-05,
523
+ "loss": 0.0078,
524
+ "step": 770
525
+ },
526
+ {
527
+ "epoch": 6.78,
528
+ "learning_rate": 3.217391304347826e-05,
529
+ "loss": 0.0045,
530
+ "step": 780
531
+ },
532
+ {
533
+ "epoch": 6.87,
534
+ "learning_rate": 3.130434782608696e-05,
535
+ "loss": 0.0014,
536
+ "step": 790
537
+ },
538
+ {
539
+ "epoch": 6.96,
540
+ "learning_rate": 3.0434782608695656e-05,
541
+ "loss": 0.0015,
542
+ "step": 800
543
+ },
544
+ {
545
+ "epoch": 7.0,
546
+ "eval_loss": 0.005614197812974453,
547
+ "eval_mse": 0.005614197812974453,
548
+ "eval_runtime": 59.7135,
549
+ "eval_samples_per_second": 3.852,
550
+ "eval_steps_per_second": 0.486,
551
+ "step": 805
552
+ },
553
+ {
554
+ "epoch": 7.04,
555
+ "learning_rate": 2.9565217391304352e-05,
556
+ "loss": 0.0055,
557
+ "step": 810
558
+ },
559
+ {
560
+ "epoch": 7.13,
561
+ "learning_rate": 2.8695652173913044e-05,
562
+ "loss": 0.0027,
563
+ "step": 820
564
+ },
565
+ {
566
+ "epoch": 7.22,
567
+ "learning_rate": 2.782608695652174e-05,
568
+ "loss": 0.001,
569
+ "step": 830
570
+ },
571
+ {
572
+ "epoch": 7.3,
573
+ "learning_rate": 2.6956521739130436e-05,
574
+ "loss": 0.0033,
575
+ "step": 840
576
+ },
577
+ {
578
+ "epoch": 7.39,
579
+ "learning_rate": 2.608695652173913e-05,
580
+ "loss": 0.0018,
581
+ "step": 850
582
+ },
583
+ {
584
+ "epoch": 7.48,
585
+ "learning_rate": 2.5217391304347827e-05,
586
+ "loss": 0.0013,
587
+ "step": 860
588
+ },
589
+ {
590
+ "epoch": 7.57,
591
+ "learning_rate": 2.4347826086956523e-05,
592
+ "loss": 0.001,
593
+ "step": 870
594
+ },
595
+ {
596
+ "epoch": 7.65,
597
+ "learning_rate": 2.347826086956522e-05,
598
+ "loss": 0.0032,
599
+ "step": 880
600
+ },
601
+ {
602
+ "epoch": 7.74,
603
+ "learning_rate": 2.2608695652173914e-05,
604
+ "loss": 0.0011,
605
+ "step": 890
606
+ },
607
+ {
608
+ "epoch": 7.83,
609
+ "learning_rate": 2.173913043478261e-05,
610
+ "loss": 0.0007,
611
+ "step": 900
612
+ },
613
+ {
614
+ "epoch": 7.91,
615
+ "learning_rate": 2.0869565217391303e-05,
616
+ "loss": 0.0012,
617
+ "step": 910
618
+ },
619
+ {
620
+ "epoch": 8.0,
621
+ "learning_rate": 2e-05,
622
+ "loss": 0.0008,
623
+ "step": 920
624
+ },
625
+ {
626
+ "epoch": 8.0,
627
+ "eval_loss": 0.005982376169413328,
628
+ "eval_mse": 0.005982376169413328,
629
+ "eval_runtime": 60.492,
630
+ "eval_samples_per_second": 3.802,
631
+ "eval_steps_per_second": 0.479,
632
+ "step": 920
633
+ },
634
+ {
635
+ "epoch": 8.09,
636
+ "learning_rate": 1.9130434782608697e-05,
637
+ "loss": 0.0011,
638
+ "step": 930
639
+ },
640
+ {
641
+ "epoch": 8.17,
642
+ "learning_rate": 1.8260869565217393e-05,
643
+ "loss": 0.0009,
644
+ "step": 940
645
+ },
646
+ {
647
+ "epoch": 8.26,
648
+ "learning_rate": 1.739130434782609e-05,
649
+ "loss": 0.0027,
650
+ "step": 950
651
+ },
652
+ {
653
+ "epoch": 8.35,
654
+ "learning_rate": 1.652173913043478e-05,
655
+ "loss": 0.0028,
656
+ "step": 960
657
+ },
658
+ {
659
+ "epoch": 8.43,
660
+ "learning_rate": 1.565217391304348e-05,
661
+ "loss": 0.0012,
662
+ "step": 970
663
+ },
664
+ {
665
+ "epoch": 8.52,
666
+ "learning_rate": 1.4782608695652176e-05,
667
+ "loss": 0.0008,
668
+ "step": 980
669
+ },
670
+ {
671
+ "epoch": 8.61,
672
+ "learning_rate": 1.391304347826087e-05,
673
+ "loss": 0.0047,
674
+ "step": 990
675
+ },
676
+ {
677
+ "epoch": 8.7,
678
+ "learning_rate": 1.3043478260869566e-05,
679
+ "loss": 0.0013,
680
+ "step": 1000
681
+ },
682
+ {
683
+ "epoch": 8.78,
684
+ "learning_rate": 1.2173913043478261e-05,
685
+ "loss": 0.0009,
686
+ "step": 1010
687
+ },
688
+ {
689
+ "epoch": 8.87,
690
+ "learning_rate": 1.1304347826086957e-05,
691
+ "loss": 0.0009,
692
+ "step": 1020
693
+ },
694
+ {
695
+ "epoch": 8.96,
696
+ "learning_rate": 1.0434782608695651e-05,
697
+ "loss": 0.0092,
698
+ "step": 1030
699
+ },
700
+ {
701
+ "epoch": 9.0,
702
+ "eval_loss": 0.005765838548541069,
703
+ "eval_mse": 0.005765838548541069,
704
+ "eval_runtime": 61.2576,
705
+ "eval_samples_per_second": 3.755,
706
+ "eval_steps_per_second": 0.473,
707
+ "step": 1035
708
+ },
709
+ {
710
+ "epoch": 9.04,
711
+ "learning_rate": 9.565217391304349e-06,
712
+ "loss": 0.0008,
713
+ "step": 1040
714
+ },
715
+ {
716
+ "epoch": 9.13,
717
+ "learning_rate": 8.695652173913044e-06,
718
+ "loss": 0.0023,
719
+ "step": 1050
720
+ },
721
+ {
722
+ "epoch": 9.22,
723
+ "learning_rate": 7.82608695652174e-06,
724
+ "loss": 0.0011,
725
+ "step": 1060
726
+ },
727
+ {
728
+ "epoch": 9.3,
729
+ "learning_rate": 6.956521739130435e-06,
730
+ "loss": 0.0011,
731
+ "step": 1070
732
+ },
733
+ {
734
+ "epoch": 9.39,
735
+ "learning_rate": 6.086956521739131e-06,
736
+ "loss": 0.0066,
737
+ "step": 1080
738
+ },
739
+ {
740
+ "epoch": 9.48,
741
+ "learning_rate": 5.217391304347826e-06,
742
+ "loss": 0.0006,
743
+ "step": 1090
744
+ },
745
+ {
746
+ "epoch": 9.57,
747
+ "learning_rate": 4.347826086956522e-06,
748
+ "loss": 0.001,
749
+ "step": 1100
750
+ },
751
+ {
752
+ "epoch": 9.65,
753
+ "learning_rate": 3.4782608695652175e-06,
754
+ "loss": 0.0026,
755
+ "step": 1110
756
+ },
757
+ {
758
+ "epoch": 9.74,
759
+ "learning_rate": 2.608695652173913e-06,
760
+ "loss": 0.0048,
761
+ "step": 1120
762
+ },
763
+ {
764
+ "epoch": 9.83,
765
+ "learning_rate": 1.7391304347826088e-06,
766
+ "loss": 0.0009,
767
+ "step": 1130
768
+ },
769
+ {
770
+ "epoch": 9.91,
771
+ "learning_rate": 8.695652173913044e-07,
772
+ "loss": 0.0011,
773
+ "step": 1140
774
+ },
775
+ {
776
+ "epoch": 10.0,
777
+ "learning_rate": 0.0,
778
+ "loss": 0.0012,
779
+ "step": 1150
780
+ }
781
+ ],
782
+ "max_steps": 1150,
783
+ "num_train_epochs": 10,
784
+ "total_flos": 0.0,
785
+ "trial_name": null,
786
+ "trial_params": null
787
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f80486caf2c4a4c01686722df7401aab74c239c5332eccb9fb19a4c14d71f108
3
+ size 3899