Ilia Bondarev commited on
Commit
278f29f
·
1 Parent(s): c23eae2

added LoRa checkpoints

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,60 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ license: other
4
+ base_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503
5
+ tags:
6
+ - llama-factory
7
+ - lora
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: advertisment_instraction_mistral_lora
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # advertisment_instraction_mistral_lora
18
+
19
+ This model is a fine-tuned version of [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503) on the advertisment_instraction dataset.
20
+
21
+ ## Model description
22
+
23
+ More information needed
24
+
25
+ ## Intended uses & limitations
26
+
27
+ More information needed
28
+
29
+ ## Training and evaluation data
30
+
31
+ More information needed
32
+
33
+ ## Training procedure
34
+
35
+ ### Training hyperparameters
36
+
37
+ The following hyperparameters were used during training:
38
+ - learning_rate: 5e-05
39
+ - train_batch_size: 4
40
+ - eval_batch_size: 8
41
+ - seed: 42
42
+ - gradient_accumulation_steps: 4
43
+ - total_train_batch_size: 16
44
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
45
+ - lr_scheduler_type: cosine
46
+ - lr_scheduler_warmup_ratio: 0.1
47
+ - num_epochs: 10
48
+ - mixed_precision_training: Native AMP
49
+
50
+ ### Training results
51
+
52
+
53
+
54
+ ### Framework versions
55
+
56
+ - PEFT 0.15.1
57
+ - Transformers 4.51.3
58
+ - Pytorch 2.6.0a0+df5bbc09d1.nv24.12
59
+ - Datasets 3.5.0
60
+ - Tokenizers 0.21.1
adapter_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60c342b25cf2d41af0cabb249185d40b70aaf2b5fd9eb6add8167c70307ac2fb
3
+ size 929
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1de3b9acde47e6b0c3dbcafe211e0e8f1b6d60defac67f28a30c9a6d111fd85f
3
+ size 203724176
all_results.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:294836220b74fe328e9a0cfd774ce938c4660348657abab0e7c11bce3fa3bc47
3
+ size 209
chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d4b1a286509cd7a45186c5a149200a61405eaee8fb4c2863a90d43ff6151775f
3
+ size 2772
preprocessor_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f1a312ed75c86bccdf65333b5b43507f597028d895dbb9cf9f16be9f87b52f1
3
+ size 634
processor_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ceef0d0cbf062ffd522d85fcbf5248c2db8659c589afcc47b2f26810e7d9a58
3
+ size 189
special_tokens_map.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c98562eff7be177aafa0bf23cacb9c86549aef1c4c60e91931bded3d70fe6f8f
3
+ size 21449
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b76085f9923309d873994d444989f7eb6ec074b06f25b58f1e8d7b7741070949
3
+ size 17078037
tokenizer_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca4b3e02214a0293ca651d23fddda8c900d2acaa49e41deb473980313345c5b8
3
+ size 198730
train_results.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:294836220b74fe328e9a0cfd774ce938c4660348657abab0e7c11bce3fa3bc47
3
+ size 209
trainer_log.jsonl ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"current_steps": 5, "total_steps": 620, "loss": 2.2782, "lr": 3.225806451612903e-06, "epoch": 0.08, "percentage": 0.81, "elapsed_time": "0:00:29", "remaining_time": "1:00:45"}
2
+ {"current_steps": 10, "total_steps": 620, "loss": 1.968, "lr": 6.451612903225806e-06, "epoch": 0.16, "percentage": 1.61, "elapsed_time": "0:00:46", "remaining_time": "0:46:59"}
3
+ {"current_steps": 15, "total_steps": 620, "loss": 1.9317, "lr": 1.0483870967741936e-05, "epoch": 0.24, "percentage": 2.42, "elapsed_time": "0:01:03", "remaining_time": "0:42:21"}
4
+ {"current_steps": 20, "total_steps": 620, "loss": 1.7964, "lr": 1.4516129032258066e-05, "epoch": 0.32, "percentage": 3.23, "elapsed_time": "0:01:10", "remaining_time": "0:35:12"}
5
+ {"current_steps": 25, "total_steps": 620, "loss": 1.7135, "lr": 1.8548387096774193e-05, "epoch": 0.4, "percentage": 4.03, "elapsed_time": "0:01:17", "remaining_time": "0:30:51"}
6
+ {"current_steps": 30, "total_steps": 620, "loss": 1.7883, "lr": 2.258064516129032e-05, "epoch": 0.48, "percentage": 4.84, "elapsed_time": "0:01:23", "remaining_time": "0:27:24"}
7
+ {"current_steps": 35, "total_steps": 620, "loss": 1.5947, "lr": 2.661290322580645e-05, "epoch": 0.56, "percentage": 5.65, "elapsed_time": "0:01:29", "remaining_time": "0:25:00"}
8
+ {"current_steps": 40, "total_steps": 620, "loss": 1.5961, "lr": 3.0645161290322585e-05, "epoch": 0.64, "percentage": 6.45, "elapsed_time": "0:01:34", "remaining_time": "0:22:45"}
9
+ {"current_steps": 45, "total_steps": 620, "loss": 1.596, "lr": 3.467741935483872e-05, "epoch": 0.72, "percentage": 7.26, "elapsed_time": "0:01:38", "remaining_time": "0:20:59"}
10
+ {"current_steps": 50, "total_steps": 620, "loss": 1.7038, "lr": 3.870967741935484e-05, "epoch": 0.8, "percentage": 8.06, "elapsed_time": "0:01:47", "remaining_time": "0:20:27"}
11
+ {"current_steps": 55, "total_steps": 620, "loss": 1.5217, "lr": 4.2741935483870973e-05, "epoch": 0.88, "percentage": 8.87, "elapsed_time": "0:01:53", "remaining_time": "0:19:29"}
12
+ {"current_steps": 60, "total_steps": 620, "loss": 1.6122, "lr": 4.67741935483871e-05, "epoch": 0.96, "percentage": 9.68, "elapsed_time": "0:01:58", "remaining_time": "0:18:23"}
13
+ {"current_steps": 65, "total_steps": 620, "loss": 1.3147, "lr": 4.999960377651517e-05, "epoch": 1.032, "percentage": 10.48, "elapsed_time": "0:02:02", "remaining_time": "0:17:23"}
14
+ {"current_steps": 70, "total_steps": 620, "loss": 1.1399, "lr": 4.998573727324295e-05, "epoch": 1.112, "percentage": 11.29, "elapsed_time": "0:02:06", "remaining_time": "0:16:34"}
15
+ {"current_steps": 75, "total_steps": 620, "loss": 1.268, "lr": 4.9952072153383575e-05, "epoch": 1.192, "percentage": 12.1, "elapsed_time": "0:02:10", "remaining_time": "0:15:51"}
16
+ {"current_steps": 80, "total_steps": 620, "loss": 1.1865, "lr": 4.9898635093068036e-05, "epoch": 1.272, "percentage": 12.9, "elapsed_time": "0:02:16", "remaining_time": "0:15:23"}
17
+ {"current_steps": 85, "total_steps": 620, "loss": 0.9661, "lr": 4.982546843564834e-05, "epoch": 1.3519999999999999, "percentage": 13.71, "elapsed_time": "0:02:21", "remaining_time": "0:14:50"}
18
+ {"current_steps": 90, "total_steps": 620, "loss": 0.9975, "lr": 4.97326301581448e-05, "epoch": 1.432, "percentage": 14.52, "elapsed_time": "0:02:25", "remaining_time": "0:14:18"}
19
+ {"current_steps": 95, "total_steps": 620, "loss": 1.0662, "lr": 4.962019382530521e-05, "epoch": 1.512, "percentage": 15.32, "elapsed_time": "0:02:30", "remaining_time": "0:13:49"}
20
+ {"current_steps": 100, "total_steps": 620, "loss": 1.2219, "lr": 4.948824853131236e-05, "epoch": 1.592, "percentage": 16.13, "elapsed_time": "0:02:34", "remaining_time": "0:13:24"}
21
+ {"current_steps": 105, "total_steps": 620, "loss": 1.2591, "lr": 4.933689882918618e-05, "epoch": 1.6720000000000002, "percentage": 16.94, "elapsed_time": "0:02:39", "remaining_time": "0:13:00"}
22
+ {"current_steps": 110, "total_steps": 620, "loss": 1.1594, "lr": 4.916626464793616e-05, "epoch": 1.752, "percentage": 17.74, "elapsed_time": "0:02:43", "remaining_time": "0:12:38"}
23
+ {"current_steps": 115, "total_steps": 620, "loss": 1.1577, "lr": 4.897648119753006e-05, "epoch": 1.8319999999999999, "percentage": 18.55, "elapsed_time": "0:02:47", "remaining_time": "0:12:17"}
24
+ {"current_steps": 120, "total_steps": 620, "loss": 1.1425, "lr": 4.876769886175396e-05, "epoch": 1.912, "percentage": 19.35, "elapsed_time": "0:02:52", "remaining_time": "0:11:58"}
25
+ {"current_steps": 125, "total_steps": 620, "loss": 1.3166, "lr": 4.8540083079048645e-05, "epoch": 1.992, "percentage": 20.16, "elapsed_time": "0:02:57", "remaining_time": "0:11:41"}
26
+ {"current_steps": 130, "total_steps": 620, "loss": 0.7055, "lr": 4.829381421141671e-05, "epoch": 2.064, "percentage": 20.97, "elapsed_time": "0:03:01", "remaining_time": "0:11:22"}
27
+ {"current_steps": 135, "total_steps": 620, "loss": 0.6371, "lr": 4.802908740150431e-05, "epoch": 2.144, "percentage": 21.77, "elapsed_time": "0:03:05", "remaining_time": "0:11:06"}
28
+ {"current_steps": 140, "total_steps": 620, "loss": 0.6376, "lr": 4.7746112417970766e-05, "epoch": 2.224, "percentage": 22.58, "elapsed_time": "0:03:09", "remaining_time": "0:10:51"}
29
+ {"current_steps": 145, "total_steps": 620, "loss": 0.6777, "lr": 4.7445113489268544e-05, "epoch": 2.304, "percentage": 23.39, "elapsed_time": "0:03:14", "remaining_time": "0:10:36"}
30
+ {"current_steps": 150, "total_steps": 620, "loss": 0.7888, "lr": 4.712632912596538e-05, "epoch": 2.384, "percentage": 24.19, "elapsed_time": "0:03:18", "remaining_time": "0:10:22"}
31
+ {"current_steps": 155, "total_steps": 620, "loss": 0.6546, "lr": 4.6790011931749314e-05, "epoch": 2.464, "percentage": 25.0, "elapsed_time": "0:03:23", "remaining_time": "0:10:09"}
32
+ {"current_steps": 160, "total_steps": 620, "loss": 0.6856, "lr": 4.643642840326627e-05, "epoch": 2.544, "percentage": 25.81, "elapsed_time": "0:03:27", "remaining_time": "0:09:56"}
33
+ {"current_steps": 165, "total_steps": 620, "loss": 0.7111, "lr": 4.60658587189491e-05, "epoch": 2.624, "percentage": 26.61, "elapsed_time": "0:03:32", "remaining_time": "0:09:44"}
34
+ {"current_steps": 170, "total_steps": 620, "loss": 0.7364, "lr": 4.5678596517004966e-05, "epoch": 2.7039999999999997, "percentage": 27.42, "elapsed_time": "0:03:36", "remaining_time": "0:09:32"}
35
+ {"current_steps": 175, "total_steps": 620, "loss": 0.6989, "lr": 4.527494866273753e-05, "epoch": 2.784, "percentage": 28.23, "elapsed_time": "0:03:40", "remaining_time": "0:09:21"}
36
+ {"current_steps": 180, "total_steps": 620, "loss": 0.7114, "lr": 4.48552350053878e-05, "epoch": 2.864, "percentage": 29.03, "elapsed_time": "0:03:45", "remaining_time": "0:09:10"}
37
+ {"current_steps": 185, "total_steps": 620, "loss": 0.7303, "lr": 4.441978812468666e-05, "epoch": 2.944, "percentage": 29.84, "elapsed_time": "0:03:49", "remaining_time": "0:09:00"}
38
+ {"current_steps": 190, "total_steps": 620, "loss": 0.701, "lr": 4.3968953067319777e-05, "epoch": 3.016, "percentage": 30.65, "elapsed_time": "0:03:53", "remaining_time": "0:08:48"}
39
+ {"current_steps": 195, "total_steps": 620, "loss": 0.3767, "lr": 4.350308707351372e-05, "epoch": 3.096, "percentage": 31.45, "elapsed_time": "0:03:58", "remaining_time": "0:08:38"}
40
+ {"current_steps": 200, "total_steps": 620, "loss": 0.3995, "lr": 4.302255929396003e-05, "epoch": 3.176, "percentage": 32.26, "elapsed_time": "0:04:02", "remaining_time": "0:08:29"}
41
+ {"current_steps": 205, "total_steps": 620, "loss": 0.4409, "lr": 4.2527750497301323e-05, "epoch": 3.2560000000000002, "percentage": 33.06, "elapsed_time": "0:04:07", "remaining_time": "0:08:20"}
42
+ {"current_steps": 210, "total_steps": 620, "loss": 0.3631, "lr": 4.201905276841153e-05, "epoch": 3.336, "percentage": 33.87, "elapsed_time": "0:04:11", "remaining_time": "0:08:11"}
43
+ {"current_steps": 215, "total_steps": 620, "loss": 0.3924, "lr": 4.1496869197709146e-05, "epoch": 3.416, "percentage": 34.68, "elapsed_time": "0:04:16", "remaining_time": "0:08:02"}
44
+ {"current_steps": 220, "total_steps": 620, "loss": 0.3611, "lr": 4.096161356174959e-05, "epoch": 3.496, "percentage": 35.48, "elapsed_time": "0:04:20", "remaining_time": "0:07:53"}
45
+ {"current_steps": 225, "total_steps": 620, "loss": 0.357, "lr": 4.0413709995350145e-05, "epoch": 3.576, "percentage": 36.29, "elapsed_time": "0:04:24", "remaining_time": "0:07:45"}
46
+ {"current_steps": 230, "total_steps": 620, "loss": 0.3885, "lr": 3.985359265550682e-05, "epoch": 3.656, "percentage": 37.1, "elapsed_time": "0:04:29", "remaining_time": "0:07:36"}
47
+ {"current_steps": 235, "total_steps": 620, "loss": 0.4416, "lr": 3.928170537736981e-05, "epoch": 3.7359999999999998, "percentage": 37.9, "elapsed_time": "0:04:33", "remaining_time": "0:07:28"}
48
+ {"current_steps": 240, "total_steps": 620, "loss": 0.4006, "lr": 3.869850132254996e-05, "epoch": 3.816, "percentage": 38.71, "elapsed_time": "0:04:38", "remaining_time": "0:07:20"}
49
+ {"current_steps": 245, "total_steps": 620, "loss": 0.3906, "lr": 3.8104442620035e-05, "epoch": 3.896, "percentage": 39.52, "elapsed_time": "0:04:42", "remaining_time": "0:07:12"}
50
+ {"current_steps": 250, "total_steps": 620, "loss": 0.4063, "lr": 3.7500000000000003e-05, "epoch": 3.976, "percentage": 40.32, "elapsed_time": "0:04:46", "remaining_time": "0:07:04"}
51
+ {"current_steps": 255, "total_steps": 620, "loss": 0.2572, "lr": 3.688565242080238e-05, "epoch": 4.048, "percentage": 41.13, "elapsed_time": "0:04:51", "remaining_time": "0:06:56"}
52
+ {"current_steps": 260, "total_steps": 620, "loss": 0.1815, "lr": 3.626188668945683e-05, "epoch": 4.128, "percentage": 41.94, "elapsed_time": "0:04:55", "remaining_time": "0:06:48"}
53
+ {"current_steps": 265, "total_steps": 620, "loss": 0.2017, "lr": 3.562919707589102e-05, "epoch": 4.208, "percentage": 42.74, "elapsed_time": "0:04:59", "remaining_time": "0:06:41"}
54
+ {"current_steps": 270, "total_steps": 620, "loss": 0.2189, "lr": 3.498808492128776e-05, "epoch": 4.288, "percentage": 43.55, "elapsed_time": "0:05:04", "remaining_time": "0:06:34"}
55
+ {"current_steps": 275, "total_steps": 620, "loss": 0.2015, "lr": 3.4339058240823843e-05, "epoch": 4.368, "percentage": 44.35, "elapsed_time": "0:05:08", "remaining_time": "0:06:27"}
56
+ {"current_steps": 280, "total_steps": 620, "loss": 0.2357, "lr": 3.3682631321120504e-05, "epoch": 4.448, "percentage": 45.16, "elapsed_time": "0:05:12", "remaining_time": "0:06:19"}
57
+ {"current_steps": 285, "total_steps": 620, "loss": 0.2102, "lr": 3.301932431272439e-05, "epoch": 4.5280000000000005, "percentage": 45.97, "elapsed_time": "0:05:17", "remaining_time": "0:06:13"}
58
+ {"current_steps": 290, "total_steps": 620, "loss": 0.2129, "lr": 3.234966281794193e-05, "epoch": 4.608, "percentage": 46.77, "elapsed_time": "0:05:21", "remaining_time": "0:06:06"}
59
+ {"current_steps": 295, "total_steps": 620, "loss": 0.2084, "lr": 3.167417747435379e-05, "epoch": 4.688, "percentage": 47.58, "elapsed_time": "0:05:26", "remaining_time": "0:05:59"}
60
+ {"current_steps": 300, "total_steps": 620, "loss": 0.2649, "lr": 3.099340353433946e-05, "epoch": 4.768, "percentage": 48.39, "elapsed_time": "0:05:30", "remaining_time": "0:05:52"}
61
+ {"current_steps": 305, "total_steps": 620, "loss": 0.2304, "lr": 3.0307880440944902e-05, "epoch": 4.848, "percentage": 49.19, "elapsed_time": "0:05:34", "remaining_time": "0:05:45"}
62
+ {"current_steps": 310, "total_steps": 620, "loss": 0.2567, "lr": 2.961815140042974e-05, "epoch": 4.928, "percentage": 50.0, "elapsed_time": "0:05:39", "remaining_time": "0:05:39"}
63
+ {"current_steps": 315, "total_steps": 620, "loss": 0.2244, "lr": 2.892476295183232e-05, "epoch": 5.0, "percentage": 50.81, "elapsed_time": "0:05:43", "remaining_time": "0:05:32"}
64
+ {"current_steps": 320, "total_steps": 620, "loss": 0.1303, "lr": 2.822826453389404e-05, "epoch": 5.08, "percentage": 51.61, "elapsed_time": "0:05:47", "remaining_time": "0:05:25"}
65
+ {"current_steps": 325, "total_steps": 620, "loss": 0.1348, "lr": 2.7529208049685807e-05, "epoch": 5.16, "percentage": 52.42, "elapsed_time": "0:05:51", "remaining_time": "0:05:19"}
66
+ {"current_steps": 330, "total_steps": 620, "loss": 0.1387, "lr": 2.6828147429281902e-05, "epoch": 5.24, "percentage": 53.23, "elapsed_time": "0:05:56", "remaining_time": "0:05:13"}
67
+ {"current_steps": 335, "total_steps": 620, "loss": 0.1303, "lr": 2.612563819082757e-05, "epoch": 5.32, "percentage": 54.03, "elapsed_time": "0:06:00", "remaining_time": "0:05:06"}
68
+ {"current_steps": 340, "total_steps": 620, "loss": 0.1399, "lr": 2.5422237000348276e-05, "epoch": 5.4, "percentage": 54.84, "elapsed_time": "0:06:04", "remaining_time": "0:05:00"}
69
+ {"current_steps": 345, "total_steps": 620, "loss": 0.1398, "lr": 2.4718501230649355e-05, "epoch": 5.48, "percentage": 55.65, "elapsed_time": "0:06:09", "remaining_time": "0:04:54"}
70
+ {"current_steps": 350, "total_steps": 620, "loss": 0.1509, "lr": 2.4014988519655618e-05, "epoch": 5.5600000000000005, "percentage": 56.45, "elapsed_time": "0:06:13", "remaining_time": "0:04:48"}
71
+ {"current_steps": 355, "total_steps": 620, "loss": 0.1339, "lr": 2.331225632854087e-05, "epoch": 5.64, "percentage": 57.26, "elapsed_time": "0:06:17", "remaining_time": "0:04:42"}
72
+ {"current_steps": 360, "total_steps": 620, "loss": 0.1239, "lr": 2.261086149999755e-05, "epoch": 5.72, "percentage": 58.06, "elapsed_time": "0:06:23", "remaining_time": "0:04:37"}
73
+ {"current_steps": 365, "total_steps": 620, "loss": 0.1603, "lr": 2.1911359816996342e-05, "epoch": 5.8, "percentage": 58.87, "elapsed_time": "0:06:28", "remaining_time": "0:04:31"}
74
+ {"current_steps": 370, "total_steps": 620, "loss": 0.1687, "lr": 2.1214305562385592e-05, "epoch": 5.88, "percentage": 59.68, "elapsed_time": "0:06:32", "remaining_time": "0:04:25"}
75
+ {"current_steps": 375, "total_steps": 620, "loss": 0.1581, "lr": 2.0520251079679373e-05, "epoch": 5.96, "percentage": 60.48, "elapsed_time": "0:06:36", "remaining_time": "0:04:19"}
76
+ {"current_steps": 380, "total_steps": 620, "loss": 0.1275, "lr": 1.982974633538232e-05, "epoch": 6.032, "percentage": 61.29, "elapsed_time": "0:06:40", "remaining_time": "0:04:13"}
77
+ {"current_steps": 385, "total_steps": 620, "loss": 0.0924, "lr": 1.914333848319795e-05, "epoch": 6.112, "percentage": 62.1, "elapsed_time": "0:06:45", "remaining_time": "0:04:07"}
78
+ {"current_steps": 390, "total_steps": 620, "loss": 0.095, "lr": 1.8461571430465834e-05, "epoch": 6.192, "percentage": 62.9, "elapsed_time": "0:06:49", "remaining_time": "0:04:01"}
79
+ {"current_steps": 395, "total_steps": 620, "loss": 0.1112, "lr": 1.778498540717124e-05, "epoch": 6.272, "percentage": 63.71, "elapsed_time": "0:06:53", "remaining_time": "0:03:55"}
80
+ {"current_steps": 400, "total_steps": 620, "loss": 0.1241, "lr": 1.711411653786861e-05, "epoch": 6.352, "percentage": 64.52, "elapsed_time": "0:06:58", "remaining_time": "0:03:50"}
81
+ {"current_steps": 405, "total_steps": 620, "loss": 0.1017, "lr": 1.6449496416858284e-05, "epoch": 6.432, "percentage": 65.32, "elapsed_time": "0:07:02", "remaining_time": "0:03:44"}
82
+ {"current_steps": 410, "total_steps": 620, "loss": 0.0963, "lr": 1.5791651686952823e-05, "epoch": 6.5120000000000005, "percentage": 66.13, "elapsed_time": "0:07:07", "remaining_time": "0:03:38"}
83
+ {"current_steps": 415, "total_steps": 620, "loss": 0.1121, "lr": 1.5141103622167041e-05, "epoch": 6.592, "percentage": 66.94, "elapsed_time": "0:07:11", "remaining_time": "0:03:33"}
84
+ {"current_steps": 420, "total_steps": 620, "loss": 0.1037, "lr": 1.4498367714662128e-05, "epoch": 6.672, "percentage": 67.74, "elapsed_time": "0:07:16", "remaining_time": "0:03:27"}
85
+ {"current_steps": 50, "total_steps": 1340, "loss": 1.9027, "lr": 1.791044776119403e-05, "epoch": 0.373134328358209, "percentage": 3.73, "elapsed_time": "0:02:27", "remaining_time": "1:03:27"}
86
+ {"current_steps": 100, "total_steps": 1340, "loss": 1.6079, "lr": 3.619402985074627e-05, "epoch": 0.746268656716418, "percentage": 7.46, "elapsed_time": "0:03:47", "remaining_time": "0:46:56"}
87
+ {"current_steps": 150, "total_steps": 1340, "loss": 1.4856, "lr": 4.998566623293603e-05, "epoch": 1.1194029850746268, "percentage": 11.19, "elapsed_time": "0:05:08", "remaining_time": "0:40:50"}
88
+ {"current_steps": 200, "total_steps": 1340, "loss": 1.2476, "lr": 4.966409127716367e-05, "epoch": 1.4925373134328357, "percentage": 14.93, "elapsed_time": "0:06:13", "remaining_time": "0:35:31"}
89
+ {"current_steps": 250, "total_steps": 1340, "loss": 1.2446, "lr": 4.892468961153105e-05, "epoch": 1.8656716417910446, "percentage": 18.66, "elapsed_time": "0:07:21", "remaining_time": "0:32:04"}
90
+ {"current_steps": 300, "total_steps": 1340, "loss": 0.9294, "lr": 4.777998721000352e-05, "epoch": 2.2388059701492535, "percentage": 22.39, "elapsed_time": "0:08:37", "remaining_time": "0:29:52"}
91
+ {"current_steps": 350, "total_steps": 1340, "loss": 0.7993, "lr": 4.6249376120430114e-05, "epoch": 2.611940298507463, "percentage": 26.12, "elapsed_time": "0:09:44", "remaining_time": "0:27:33"}
92
+ {"current_steps": 400, "total_steps": 1340, "loss": 0.7859, "lr": 4.43587859498839e-05, "epoch": 2.9850746268656714, "percentage": 29.85, "elapsed_time": "0:10:55", "remaining_time": "0:25:40"}
93
+ {"current_steps": 450, "total_steps": 1340, "loss": 0.4616, "lr": 4.214024459924221e-05, "epoch": 3.3582089552238807, "percentage": 33.58, "elapsed_time": "0:12:05", "remaining_time": "0:23:55"}
94
+ {"current_steps": 500, "total_steps": 1340, "loss": 0.479, "lr": 3.9631335688465334e-05, "epoch": 3.7313432835820897, "percentage": 37.31, "elapsed_time": "0:13:13", "remaining_time": "0:22:13"}
95
+ {"current_steps": 550, "total_steps": 1340, "loss": 0.4252, "lr": 3.6874561864164054e-05, "epoch": 4.104477611940299, "percentage": 41.04, "elapsed_time": "0:14:21", "remaining_time": "0:20:36"}
96
+ {"current_steps": 600, "total_steps": 1340, "loss": 0.267, "lr": 3.391662477546432e-05, "epoch": 4.477611940298507, "percentage": 44.78, "elapsed_time": "0:15:28", "remaining_time": "0:19:04"}
97
+ {"current_steps": 650, "total_steps": 1340, "loss": 0.2861, "lr": 3.0807633915874584e-05, "epoch": 4.850746268656716, "percentage": 48.51, "elapsed_time": "0:16:37", "remaining_time": "0:17:38"}
98
+ {"current_steps": 700, "total_steps": 1340, "loss": 0.2329, "lr": 2.7600257733919886e-05, "epoch": 5.223880597014926, "percentage": 52.24, "elapsed_time": "0:17:44", "remaining_time": "0:16:13"}
99
+ {"current_steps": 750, "total_steps": 1340, "loss": 0.1885, "lr": 2.4348831393313763e-05, "epoch": 5.597014925373134, "percentage": 55.97, "elapsed_time": "0:18:51", "remaining_time": "0:14:49"}
100
+ {"current_steps": 800, "total_steps": 1340, "loss": 0.1992, "lr": 2.110843629782583e-05, "epoch": 5.970149253731344, "percentage": 59.7, "elapsed_time": "0:20:02", "remaining_time": "0:13:31"}
101
+ {"current_steps": 850, "total_steps": 1340, "loss": 0.1429, "lr": 1.793396697432839e-05, "epoch": 6.343283582089552, "percentage": 63.43, "elapsed_time": "0:21:15", "remaining_time": "0:12:15"}
102
+ {"current_steps": 900, "total_steps": 1340, "loss": 0.1495, "lr": 1.4879201121666467e-05, "epoch": 6.7164179104477615, "percentage": 67.16, "elapsed_time": "0:22:23", "remaining_time": "0:10:56"}
103
+ {"current_steps": 950, "total_steps": 1340, "loss": 0.1442, "lr": 1.1995888579364551e-05, "epoch": 7.08955223880597, "percentage": 70.9, "elapsed_time": "0:23:31", "remaining_time": "0:09:39"}
104
+ {"current_steps": 1000, "total_steps": 1340, "loss": 0.1243, "lr": 9.332874649668369e-06, "epoch": 7.462686567164179, "percentage": 74.63, "elapsed_time": "0:24:40", "remaining_time": "0:08:23"}
105
+ {"current_steps": 1050, "total_steps": 1340, "loss": 0.1277, "lr": 6.935272624450431e-06, "epoch": 7.835820895522388, "percentage": 78.36, "elapsed_time": "0:25:50", "remaining_time": "0:07:08"}
106
+ {"current_steps": 1100, "total_steps": 1340, "loss": 0.1099, "lr": 4.843699534944257e-06, "epoch": 8.208955223880597, "percentage": 82.09, "elapsed_time": "0:27:02", "remaining_time": "0:05:53"}
107
+ {"current_steps": 1150, "total_steps": 1340, "loss": 0.1104, "lr": 3.0935880712335773e-06, "epoch": 8.582089552238806, "percentage": 85.82, "elapsed_time": "0:28:09", "remaining_time": "0:04:39"}
108
+ {"current_steps": 1200, "total_steps": 1340, "loss": 0.1081, "lr": 1.7145863280547347e-06, "epoch": 8.955223880597014, "percentage": 89.55, "elapsed_time": "0:29:16", "remaining_time": "0:03:24"}
109
+ {"current_steps": 1250, "total_steps": 1340, "loss": 0.097, "lr": 7.300555456321883e-07, "epoch": 9.328358208955224, "percentage": 93.28, "elapsed_time": "0:30:28", "remaining_time": "0:02:11"}
110
+ {"current_steps": 1300, "total_steps": 1340, "loss": 0.1038, "lr": 1.5667435416370225e-07, "epoch": 9.701492537313433, "percentage": 97.01, "elapsed_time": "0:31:34", "remaining_time": "0:00:58"}
111
+ {"current_steps": 1340, "total_steps": 1340, "epoch": 10.0, "percentage": 100.0, "elapsed_time": "0:32:35", "remaining_time": "0:00:00"}
trainer_state.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:73bb59698d456038fad4c6f48d881aa394d9c9a7f826e06c0303557672a6a14a
3
+ size 5511
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1d8fb8817be2be93bdc32d8f6327a185a8c3ccef26b750afbae7d1ca70b9e49
3
+ size 5752