Ilia Bondarev
commited on
Commit
·
278f29f
1
Parent(s):
c23eae2
added LoRa checkpoints
Browse files- .gitattributes +1 -0
- README.md +60 -3
- adapter_config.json +3 -0
- adapter_model.safetensors +3 -0
- all_results.json +3 -0
- chat_template.json +3 -0
- preprocessor_config.json +3 -0
- processor_config.json +3 -0
- special_tokens_map.json +3 -0
- tokenizer.json +3 -0
- tokenizer_config.json +3 -0
- train_results.json +3 -0
- trainer_log.jsonl +111 -0
- trainer_state.json +3 -0
- training_args.bin +3 -0
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
*.json filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,3 +1,60 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
library_name: peft
|
3 |
+
license: other
|
4 |
+
base_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503
|
5 |
+
tags:
|
6 |
+
- llama-factory
|
7 |
+
- lora
|
8 |
+
- generated_from_trainer
|
9 |
+
model-index:
|
10 |
+
- name: advertisment_instraction_mistral_lora
|
11 |
+
results: []
|
12 |
+
---
|
13 |
+
|
14 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
15 |
+
should probably proofread and complete it, then remove this comment. -->
|
16 |
+
|
17 |
+
# advertisment_instraction_mistral_lora
|
18 |
+
|
19 |
+
This model is a fine-tuned version of [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503) on the advertisment_instraction dataset.
|
20 |
+
|
21 |
+
## Model description
|
22 |
+
|
23 |
+
More information needed
|
24 |
+
|
25 |
+
## Intended uses & limitations
|
26 |
+
|
27 |
+
More information needed
|
28 |
+
|
29 |
+
## Training and evaluation data
|
30 |
+
|
31 |
+
More information needed
|
32 |
+
|
33 |
+
## Training procedure
|
34 |
+
|
35 |
+
### Training hyperparameters
|
36 |
+
|
37 |
+
The following hyperparameters were used during training:
|
38 |
+
- learning_rate: 5e-05
|
39 |
+
- train_batch_size: 4
|
40 |
+
- eval_batch_size: 8
|
41 |
+
- seed: 42
|
42 |
+
- gradient_accumulation_steps: 4
|
43 |
+
- total_train_batch_size: 16
|
44 |
+
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
45 |
+
- lr_scheduler_type: cosine
|
46 |
+
- lr_scheduler_warmup_ratio: 0.1
|
47 |
+
- num_epochs: 10
|
48 |
+
- mixed_precision_training: Native AMP
|
49 |
+
|
50 |
+
### Training results
|
51 |
+
|
52 |
+
|
53 |
+
|
54 |
+
### Framework versions
|
55 |
+
|
56 |
+
- PEFT 0.15.1
|
57 |
+
- Transformers 4.51.3
|
58 |
+
- Pytorch 2.6.0a0+df5bbc09d1.nv24.12
|
59 |
+
- Datasets 3.5.0
|
60 |
+
- Tokenizers 0.21.1
|
adapter_config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:60c342b25cf2d41af0cabb249185d40b70aaf2b5fd9eb6add8167c70307ac2fb
|
3 |
+
size 929
|
adapter_model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1de3b9acde47e6b0c3dbcafe211e0e8f1b6d60defac67f28a30c9a6d111fd85f
|
3 |
+
size 203724176
|
all_results.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:294836220b74fe328e9a0cfd774ce938c4660348657abab0e7c11bce3fa3bc47
|
3 |
+
size 209
|
chat_template.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d4b1a286509cd7a45186c5a149200a61405eaee8fb4c2863a90d43ff6151775f
|
3 |
+
size 2772
|
preprocessor_config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0f1a312ed75c86bccdf65333b5b43507f597028d895dbb9cf9f16be9f87b52f1
|
3 |
+
size 634
|
processor_config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4ceef0d0cbf062ffd522d85fcbf5248c2db8659c589afcc47b2f26810e7d9a58
|
3 |
+
size 189
|
special_tokens_map.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c98562eff7be177aafa0bf23cacb9c86549aef1c4c60e91931bded3d70fe6f8f
|
3 |
+
size 21449
|
tokenizer.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b76085f9923309d873994d444989f7eb6ec074b06f25b58f1e8d7b7741070949
|
3 |
+
size 17078037
|
tokenizer_config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ca4b3e02214a0293ca651d23fddda8c900d2acaa49e41deb473980313345c5b8
|
3 |
+
size 198730
|
train_results.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:294836220b74fe328e9a0cfd774ce938c4660348657abab0e7c11bce3fa3bc47
|
3 |
+
size 209
|
trainer_log.jsonl
ADDED
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{"current_steps": 5, "total_steps": 620, "loss": 2.2782, "lr": 3.225806451612903e-06, "epoch": 0.08, "percentage": 0.81, "elapsed_time": "0:00:29", "remaining_time": "1:00:45"}
|
2 |
+
{"current_steps": 10, "total_steps": 620, "loss": 1.968, "lr": 6.451612903225806e-06, "epoch": 0.16, "percentage": 1.61, "elapsed_time": "0:00:46", "remaining_time": "0:46:59"}
|
3 |
+
{"current_steps": 15, "total_steps": 620, "loss": 1.9317, "lr": 1.0483870967741936e-05, "epoch": 0.24, "percentage": 2.42, "elapsed_time": "0:01:03", "remaining_time": "0:42:21"}
|
4 |
+
{"current_steps": 20, "total_steps": 620, "loss": 1.7964, "lr": 1.4516129032258066e-05, "epoch": 0.32, "percentage": 3.23, "elapsed_time": "0:01:10", "remaining_time": "0:35:12"}
|
5 |
+
{"current_steps": 25, "total_steps": 620, "loss": 1.7135, "lr": 1.8548387096774193e-05, "epoch": 0.4, "percentage": 4.03, "elapsed_time": "0:01:17", "remaining_time": "0:30:51"}
|
6 |
+
{"current_steps": 30, "total_steps": 620, "loss": 1.7883, "lr": 2.258064516129032e-05, "epoch": 0.48, "percentage": 4.84, "elapsed_time": "0:01:23", "remaining_time": "0:27:24"}
|
7 |
+
{"current_steps": 35, "total_steps": 620, "loss": 1.5947, "lr": 2.661290322580645e-05, "epoch": 0.56, "percentage": 5.65, "elapsed_time": "0:01:29", "remaining_time": "0:25:00"}
|
8 |
+
{"current_steps": 40, "total_steps": 620, "loss": 1.5961, "lr": 3.0645161290322585e-05, "epoch": 0.64, "percentage": 6.45, "elapsed_time": "0:01:34", "remaining_time": "0:22:45"}
|
9 |
+
{"current_steps": 45, "total_steps": 620, "loss": 1.596, "lr": 3.467741935483872e-05, "epoch": 0.72, "percentage": 7.26, "elapsed_time": "0:01:38", "remaining_time": "0:20:59"}
|
10 |
+
{"current_steps": 50, "total_steps": 620, "loss": 1.7038, "lr": 3.870967741935484e-05, "epoch": 0.8, "percentage": 8.06, "elapsed_time": "0:01:47", "remaining_time": "0:20:27"}
|
11 |
+
{"current_steps": 55, "total_steps": 620, "loss": 1.5217, "lr": 4.2741935483870973e-05, "epoch": 0.88, "percentage": 8.87, "elapsed_time": "0:01:53", "remaining_time": "0:19:29"}
|
12 |
+
{"current_steps": 60, "total_steps": 620, "loss": 1.6122, "lr": 4.67741935483871e-05, "epoch": 0.96, "percentage": 9.68, "elapsed_time": "0:01:58", "remaining_time": "0:18:23"}
|
13 |
+
{"current_steps": 65, "total_steps": 620, "loss": 1.3147, "lr": 4.999960377651517e-05, "epoch": 1.032, "percentage": 10.48, "elapsed_time": "0:02:02", "remaining_time": "0:17:23"}
|
14 |
+
{"current_steps": 70, "total_steps": 620, "loss": 1.1399, "lr": 4.998573727324295e-05, "epoch": 1.112, "percentage": 11.29, "elapsed_time": "0:02:06", "remaining_time": "0:16:34"}
|
15 |
+
{"current_steps": 75, "total_steps": 620, "loss": 1.268, "lr": 4.9952072153383575e-05, "epoch": 1.192, "percentage": 12.1, "elapsed_time": "0:02:10", "remaining_time": "0:15:51"}
|
16 |
+
{"current_steps": 80, "total_steps": 620, "loss": 1.1865, "lr": 4.9898635093068036e-05, "epoch": 1.272, "percentage": 12.9, "elapsed_time": "0:02:16", "remaining_time": "0:15:23"}
|
17 |
+
{"current_steps": 85, "total_steps": 620, "loss": 0.9661, "lr": 4.982546843564834e-05, "epoch": 1.3519999999999999, "percentage": 13.71, "elapsed_time": "0:02:21", "remaining_time": "0:14:50"}
|
18 |
+
{"current_steps": 90, "total_steps": 620, "loss": 0.9975, "lr": 4.97326301581448e-05, "epoch": 1.432, "percentage": 14.52, "elapsed_time": "0:02:25", "remaining_time": "0:14:18"}
|
19 |
+
{"current_steps": 95, "total_steps": 620, "loss": 1.0662, "lr": 4.962019382530521e-05, "epoch": 1.512, "percentage": 15.32, "elapsed_time": "0:02:30", "remaining_time": "0:13:49"}
|
20 |
+
{"current_steps": 100, "total_steps": 620, "loss": 1.2219, "lr": 4.948824853131236e-05, "epoch": 1.592, "percentage": 16.13, "elapsed_time": "0:02:34", "remaining_time": "0:13:24"}
|
21 |
+
{"current_steps": 105, "total_steps": 620, "loss": 1.2591, "lr": 4.933689882918618e-05, "epoch": 1.6720000000000002, "percentage": 16.94, "elapsed_time": "0:02:39", "remaining_time": "0:13:00"}
|
22 |
+
{"current_steps": 110, "total_steps": 620, "loss": 1.1594, "lr": 4.916626464793616e-05, "epoch": 1.752, "percentage": 17.74, "elapsed_time": "0:02:43", "remaining_time": "0:12:38"}
|
23 |
+
{"current_steps": 115, "total_steps": 620, "loss": 1.1577, "lr": 4.897648119753006e-05, "epoch": 1.8319999999999999, "percentage": 18.55, "elapsed_time": "0:02:47", "remaining_time": "0:12:17"}
|
24 |
+
{"current_steps": 120, "total_steps": 620, "loss": 1.1425, "lr": 4.876769886175396e-05, "epoch": 1.912, "percentage": 19.35, "elapsed_time": "0:02:52", "remaining_time": "0:11:58"}
|
25 |
+
{"current_steps": 125, "total_steps": 620, "loss": 1.3166, "lr": 4.8540083079048645e-05, "epoch": 1.992, "percentage": 20.16, "elapsed_time": "0:02:57", "remaining_time": "0:11:41"}
|
26 |
+
{"current_steps": 130, "total_steps": 620, "loss": 0.7055, "lr": 4.829381421141671e-05, "epoch": 2.064, "percentage": 20.97, "elapsed_time": "0:03:01", "remaining_time": "0:11:22"}
|
27 |
+
{"current_steps": 135, "total_steps": 620, "loss": 0.6371, "lr": 4.802908740150431e-05, "epoch": 2.144, "percentage": 21.77, "elapsed_time": "0:03:05", "remaining_time": "0:11:06"}
|
28 |
+
{"current_steps": 140, "total_steps": 620, "loss": 0.6376, "lr": 4.7746112417970766e-05, "epoch": 2.224, "percentage": 22.58, "elapsed_time": "0:03:09", "remaining_time": "0:10:51"}
|
29 |
+
{"current_steps": 145, "total_steps": 620, "loss": 0.6777, "lr": 4.7445113489268544e-05, "epoch": 2.304, "percentage": 23.39, "elapsed_time": "0:03:14", "remaining_time": "0:10:36"}
|
30 |
+
{"current_steps": 150, "total_steps": 620, "loss": 0.7888, "lr": 4.712632912596538e-05, "epoch": 2.384, "percentage": 24.19, "elapsed_time": "0:03:18", "remaining_time": "0:10:22"}
|
31 |
+
{"current_steps": 155, "total_steps": 620, "loss": 0.6546, "lr": 4.6790011931749314e-05, "epoch": 2.464, "percentage": 25.0, "elapsed_time": "0:03:23", "remaining_time": "0:10:09"}
|
32 |
+
{"current_steps": 160, "total_steps": 620, "loss": 0.6856, "lr": 4.643642840326627e-05, "epoch": 2.544, "percentage": 25.81, "elapsed_time": "0:03:27", "remaining_time": "0:09:56"}
|
33 |
+
{"current_steps": 165, "total_steps": 620, "loss": 0.7111, "lr": 4.60658587189491e-05, "epoch": 2.624, "percentage": 26.61, "elapsed_time": "0:03:32", "remaining_time": "0:09:44"}
|
34 |
+
{"current_steps": 170, "total_steps": 620, "loss": 0.7364, "lr": 4.5678596517004966e-05, "epoch": 2.7039999999999997, "percentage": 27.42, "elapsed_time": "0:03:36", "remaining_time": "0:09:32"}
|
35 |
+
{"current_steps": 175, "total_steps": 620, "loss": 0.6989, "lr": 4.527494866273753e-05, "epoch": 2.784, "percentage": 28.23, "elapsed_time": "0:03:40", "remaining_time": "0:09:21"}
|
36 |
+
{"current_steps": 180, "total_steps": 620, "loss": 0.7114, "lr": 4.48552350053878e-05, "epoch": 2.864, "percentage": 29.03, "elapsed_time": "0:03:45", "remaining_time": "0:09:10"}
|
37 |
+
{"current_steps": 185, "total_steps": 620, "loss": 0.7303, "lr": 4.441978812468666e-05, "epoch": 2.944, "percentage": 29.84, "elapsed_time": "0:03:49", "remaining_time": "0:09:00"}
|
38 |
+
{"current_steps": 190, "total_steps": 620, "loss": 0.701, "lr": 4.3968953067319777e-05, "epoch": 3.016, "percentage": 30.65, "elapsed_time": "0:03:53", "remaining_time": "0:08:48"}
|
39 |
+
{"current_steps": 195, "total_steps": 620, "loss": 0.3767, "lr": 4.350308707351372e-05, "epoch": 3.096, "percentage": 31.45, "elapsed_time": "0:03:58", "remaining_time": "0:08:38"}
|
40 |
+
{"current_steps": 200, "total_steps": 620, "loss": 0.3995, "lr": 4.302255929396003e-05, "epoch": 3.176, "percentage": 32.26, "elapsed_time": "0:04:02", "remaining_time": "0:08:29"}
|
41 |
+
{"current_steps": 205, "total_steps": 620, "loss": 0.4409, "lr": 4.2527750497301323e-05, "epoch": 3.2560000000000002, "percentage": 33.06, "elapsed_time": "0:04:07", "remaining_time": "0:08:20"}
|
42 |
+
{"current_steps": 210, "total_steps": 620, "loss": 0.3631, "lr": 4.201905276841153e-05, "epoch": 3.336, "percentage": 33.87, "elapsed_time": "0:04:11", "remaining_time": "0:08:11"}
|
43 |
+
{"current_steps": 215, "total_steps": 620, "loss": 0.3924, "lr": 4.1496869197709146e-05, "epoch": 3.416, "percentage": 34.68, "elapsed_time": "0:04:16", "remaining_time": "0:08:02"}
|
44 |
+
{"current_steps": 220, "total_steps": 620, "loss": 0.3611, "lr": 4.096161356174959e-05, "epoch": 3.496, "percentage": 35.48, "elapsed_time": "0:04:20", "remaining_time": "0:07:53"}
|
45 |
+
{"current_steps": 225, "total_steps": 620, "loss": 0.357, "lr": 4.0413709995350145e-05, "epoch": 3.576, "percentage": 36.29, "elapsed_time": "0:04:24", "remaining_time": "0:07:45"}
|
46 |
+
{"current_steps": 230, "total_steps": 620, "loss": 0.3885, "lr": 3.985359265550682e-05, "epoch": 3.656, "percentage": 37.1, "elapsed_time": "0:04:29", "remaining_time": "0:07:36"}
|
47 |
+
{"current_steps": 235, "total_steps": 620, "loss": 0.4416, "lr": 3.928170537736981e-05, "epoch": 3.7359999999999998, "percentage": 37.9, "elapsed_time": "0:04:33", "remaining_time": "0:07:28"}
|
48 |
+
{"current_steps": 240, "total_steps": 620, "loss": 0.4006, "lr": 3.869850132254996e-05, "epoch": 3.816, "percentage": 38.71, "elapsed_time": "0:04:38", "remaining_time": "0:07:20"}
|
49 |
+
{"current_steps": 245, "total_steps": 620, "loss": 0.3906, "lr": 3.8104442620035e-05, "epoch": 3.896, "percentage": 39.52, "elapsed_time": "0:04:42", "remaining_time": "0:07:12"}
|
50 |
+
{"current_steps": 250, "total_steps": 620, "loss": 0.4063, "lr": 3.7500000000000003e-05, "epoch": 3.976, "percentage": 40.32, "elapsed_time": "0:04:46", "remaining_time": "0:07:04"}
|
51 |
+
{"current_steps": 255, "total_steps": 620, "loss": 0.2572, "lr": 3.688565242080238e-05, "epoch": 4.048, "percentage": 41.13, "elapsed_time": "0:04:51", "remaining_time": "0:06:56"}
|
52 |
+
{"current_steps": 260, "total_steps": 620, "loss": 0.1815, "lr": 3.626188668945683e-05, "epoch": 4.128, "percentage": 41.94, "elapsed_time": "0:04:55", "remaining_time": "0:06:48"}
|
53 |
+
{"current_steps": 265, "total_steps": 620, "loss": 0.2017, "lr": 3.562919707589102e-05, "epoch": 4.208, "percentage": 42.74, "elapsed_time": "0:04:59", "remaining_time": "0:06:41"}
|
54 |
+
{"current_steps": 270, "total_steps": 620, "loss": 0.2189, "lr": 3.498808492128776e-05, "epoch": 4.288, "percentage": 43.55, "elapsed_time": "0:05:04", "remaining_time": "0:06:34"}
|
55 |
+
{"current_steps": 275, "total_steps": 620, "loss": 0.2015, "lr": 3.4339058240823843e-05, "epoch": 4.368, "percentage": 44.35, "elapsed_time": "0:05:08", "remaining_time": "0:06:27"}
|
56 |
+
{"current_steps": 280, "total_steps": 620, "loss": 0.2357, "lr": 3.3682631321120504e-05, "epoch": 4.448, "percentage": 45.16, "elapsed_time": "0:05:12", "remaining_time": "0:06:19"}
|
57 |
+
{"current_steps": 285, "total_steps": 620, "loss": 0.2102, "lr": 3.301932431272439e-05, "epoch": 4.5280000000000005, "percentage": 45.97, "elapsed_time": "0:05:17", "remaining_time": "0:06:13"}
|
58 |
+
{"current_steps": 290, "total_steps": 620, "loss": 0.2129, "lr": 3.234966281794193e-05, "epoch": 4.608, "percentage": 46.77, "elapsed_time": "0:05:21", "remaining_time": "0:06:06"}
|
59 |
+
{"current_steps": 295, "total_steps": 620, "loss": 0.2084, "lr": 3.167417747435379e-05, "epoch": 4.688, "percentage": 47.58, "elapsed_time": "0:05:26", "remaining_time": "0:05:59"}
|
60 |
+
{"current_steps": 300, "total_steps": 620, "loss": 0.2649, "lr": 3.099340353433946e-05, "epoch": 4.768, "percentage": 48.39, "elapsed_time": "0:05:30", "remaining_time": "0:05:52"}
|
61 |
+
{"current_steps": 305, "total_steps": 620, "loss": 0.2304, "lr": 3.0307880440944902e-05, "epoch": 4.848, "percentage": 49.19, "elapsed_time": "0:05:34", "remaining_time": "0:05:45"}
|
62 |
+
{"current_steps": 310, "total_steps": 620, "loss": 0.2567, "lr": 2.961815140042974e-05, "epoch": 4.928, "percentage": 50.0, "elapsed_time": "0:05:39", "remaining_time": "0:05:39"}
|
63 |
+
{"current_steps": 315, "total_steps": 620, "loss": 0.2244, "lr": 2.892476295183232e-05, "epoch": 5.0, "percentage": 50.81, "elapsed_time": "0:05:43", "remaining_time": "0:05:32"}
|
64 |
+
{"current_steps": 320, "total_steps": 620, "loss": 0.1303, "lr": 2.822826453389404e-05, "epoch": 5.08, "percentage": 51.61, "elapsed_time": "0:05:47", "remaining_time": "0:05:25"}
|
65 |
+
{"current_steps": 325, "total_steps": 620, "loss": 0.1348, "lr": 2.7529208049685807e-05, "epoch": 5.16, "percentage": 52.42, "elapsed_time": "0:05:51", "remaining_time": "0:05:19"}
|
66 |
+
{"current_steps": 330, "total_steps": 620, "loss": 0.1387, "lr": 2.6828147429281902e-05, "epoch": 5.24, "percentage": 53.23, "elapsed_time": "0:05:56", "remaining_time": "0:05:13"}
|
67 |
+
{"current_steps": 335, "total_steps": 620, "loss": 0.1303, "lr": 2.612563819082757e-05, "epoch": 5.32, "percentage": 54.03, "elapsed_time": "0:06:00", "remaining_time": "0:05:06"}
|
68 |
+
{"current_steps": 340, "total_steps": 620, "loss": 0.1399, "lr": 2.5422237000348276e-05, "epoch": 5.4, "percentage": 54.84, "elapsed_time": "0:06:04", "remaining_time": "0:05:00"}
|
69 |
+
{"current_steps": 345, "total_steps": 620, "loss": 0.1398, "lr": 2.4718501230649355e-05, "epoch": 5.48, "percentage": 55.65, "elapsed_time": "0:06:09", "remaining_time": "0:04:54"}
|
70 |
+
{"current_steps": 350, "total_steps": 620, "loss": 0.1509, "lr": 2.4014988519655618e-05, "epoch": 5.5600000000000005, "percentage": 56.45, "elapsed_time": "0:06:13", "remaining_time": "0:04:48"}
|
71 |
+
{"current_steps": 355, "total_steps": 620, "loss": 0.1339, "lr": 2.331225632854087e-05, "epoch": 5.64, "percentage": 57.26, "elapsed_time": "0:06:17", "remaining_time": "0:04:42"}
|
72 |
+
{"current_steps": 360, "total_steps": 620, "loss": 0.1239, "lr": 2.261086149999755e-05, "epoch": 5.72, "percentage": 58.06, "elapsed_time": "0:06:23", "remaining_time": "0:04:37"}
|
73 |
+
{"current_steps": 365, "total_steps": 620, "loss": 0.1603, "lr": 2.1911359816996342e-05, "epoch": 5.8, "percentage": 58.87, "elapsed_time": "0:06:28", "remaining_time": "0:04:31"}
|
74 |
+
{"current_steps": 370, "total_steps": 620, "loss": 0.1687, "lr": 2.1214305562385592e-05, "epoch": 5.88, "percentage": 59.68, "elapsed_time": "0:06:32", "remaining_time": "0:04:25"}
|
75 |
+
{"current_steps": 375, "total_steps": 620, "loss": 0.1581, "lr": 2.0520251079679373e-05, "epoch": 5.96, "percentage": 60.48, "elapsed_time": "0:06:36", "remaining_time": "0:04:19"}
|
76 |
+
{"current_steps": 380, "total_steps": 620, "loss": 0.1275, "lr": 1.982974633538232e-05, "epoch": 6.032, "percentage": 61.29, "elapsed_time": "0:06:40", "remaining_time": "0:04:13"}
|
77 |
+
{"current_steps": 385, "total_steps": 620, "loss": 0.0924, "lr": 1.914333848319795e-05, "epoch": 6.112, "percentage": 62.1, "elapsed_time": "0:06:45", "remaining_time": "0:04:07"}
|
78 |
+
{"current_steps": 390, "total_steps": 620, "loss": 0.095, "lr": 1.8461571430465834e-05, "epoch": 6.192, "percentage": 62.9, "elapsed_time": "0:06:49", "remaining_time": "0:04:01"}
|
79 |
+
{"current_steps": 395, "total_steps": 620, "loss": 0.1112, "lr": 1.778498540717124e-05, "epoch": 6.272, "percentage": 63.71, "elapsed_time": "0:06:53", "remaining_time": "0:03:55"}
|
80 |
+
{"current_steps": 400, "total_steps": 620, "loss": 0.1241, "lr": 1.711411653786861e-05, "epoch": 6.352, "percentage": 64.52, "elapsed_time": "0:06:58", "remaining_time": "0:03:50"}
|
81 |
+
{"current_steps": 405, "total_steps": 620, "loss": 0.1017, "lr": 1.6449496416858284e-05, "epoch": 6.432, "percentage": 65.32, "elapsed_time": "0:07:02", "remaining_time": "0:03:44"}
|
82 |
+
{"current_steps": 410, "total_steps": 620, "loss": 0.0963, "lr": 1.5791651686952823e-05, "epoch": 6.5120000000000005, "percentage": 66.13, "elapsed_time": "0:07:07", "remaining_time": "0:03:38"}
|
83 |
+
{"current_steps": 415, "total_steps": 620, "loss": 0.1121, "lr": 1.5141103622167041e-05, "epoch": 6.592, "percentage": 66.94, "elapsed_time": "0:07:11", "remaining_time": "0:03:33"}
|
84 |
+
{"current_steps": 420, "total_steps": 620, "loss": 0.1037, "lr": 1.4498367714662128e-05, "epoch": 6.672, "percentage": 67.74, "elapsed_time": "0:07:16", "remaining_time": "0:03:27"}
|
85 |
+
{"current_steps": 50, "total_steps": 1340, "loss": 1.9027, "lr": 1.791044776119403e-05, "epoch": 0.373134328358209, "percentage": 3.73, "elapsed_time": "0:02:27", "remaining_time": "1:03:27"}
|
86 |
+
{"current_steps": 100, "total_steps": 1340, "loss": 1.6079, "lr": 3.619402985074627e-05, "epoch": 0.746268656716418, "percentage": 7.46, "elapsed_time": "0:03:47", "remaining_time": "0:46:56"}
|
87 |
+
{"current_steps": 150, "total_steps": 1340, "loss": 1.4856, "lr": 4.998566623293603e-05, "epoch": 1.1194029850746268, "percentage": 11.19, "elapsed_time": "0:05:08", "remaining_time": "0:40:50"}
|
88 |
+
{"current_steps": 200, "total_steps": 1340, "loss": 1.2476, "lr": 4.966409127716367e-05, "epoch": 1.4925373134328357, "percentage": 14.93, "elapsed_time": "0:06:13", "remaining_time": "0:35:31"}
|
89 |
+
{"current_steps": 250, "total_steps": 1340, "loss": 1.2446, "lr": 4.892468961153105e-05, "epoch": 1.8656716417910446, "percentage": 18.66, "elapsed_time": "0:07:21", "remaining_time": "0:32:04"}
|
90 |
+
{"current_steps": 300, "total_steps": 1340, "loss": 0.9294, "lr": 4.777998721000352e-05, "epoch": 2.2388059701492535, "percentage": 22.39, "elapsed_time": "0:08:37", "remaining_time": "0:29:52"}
|
91 |
+
{"current_steps": 350, "total_steps": 1340, "loss": 0.7993, "lr": 4.6249376120430114e-05, "epoch": 2.611940298507463, "percentage": 26.12, "elapsed_time": "0:09:44", "remaining_time": "0:27:33"}
|
92 |
+
{"current_steps": 400, "total_steps": 1340, "loss": 0.7859, "lr": 4.43587859498839e-05, "epoch": 2.9850746268656714, "percentage": 29.85, "elapsed_time": "0:10:55", "remaining_time": "0:25:40"}
|
93 |
+
{"current_steps": 450, "total_steps": 1340, "loss": 0.4616, "lr": 4.214024459924221e-05, "epoch": 3.3582089552238807, "percentage": 33.58, "elapsed_time": "0:12:05", "remaining_time": "0:23:55"}
|
94 |
+
{"current_steps": 500, "total_steps": 1340, "loss": 0.479, "lr": 3.9631335688465334e-05, "epoch": 3.7313432835820897, "percentage": 37.31, "elapsed_time": "0:13:13", "remaining_time": "0:22:13"}
|
95 |
+
{"current_steps": 550, "total_steps": 1340, "loss": 0.4252, "lr": 3.6874561864164054e-05, "epoch": 4.104477611940299, "percentage": 41.04, "elapsed_time": "0:14:21", "remaining_time": "0:20:36"}
|
96 |
+
{"current_steps": 600, "total_steps": 1340, "loss": 0.267, "lr": 3.391662477546432e-05, "epoch": 4.477611940298507, "percentage": 44.78, "elapsed_time": "0:15:28", "remaining_time": "0:19:04"}
|
97 |
+
{"current_steps": 650, "total_steps": 1340, "loss": 0.2861, "lr": 3.0807633915874584e-05, "epoch": 4.850746268656716, "percentage": 48.51, "elapsed_time": "0:16:37", "remaining_time": "0:17:38"}
|
98 |
+
{"current_steps": 700, "total_steps": 1340, "loss": 0.2329, "lr": 2.7600257733919886e-05, "epoch": 5.223880597014926, "percentage": 52.24, "elapsed_time": "0:17:44", "remaining_time": "0:16:13"}
|
99 |
+
{"current_steps": 750, "total_steps": 1340, "loss": 0.1885, "lr": 2.4348831393313763e-05, "epoch": 5.597014925373134, "percentage": 55.97, "elapsed_time": "0:18:51", "remaining_time": "0:14:49"}
|
100 |
+
{"current_steps": 800, "total_steps": 1340, "loss": 0.1992, "lr": 2.110843629782583e-05, "epoch": 5.970149253731344, "percentage": 59.7, "elapsed_time": "0:20:02", "remaining_time": "0:13:31"}
|
101 |
+
{"current_steps": 850, "total_steps": 1340, "loss": 0.1429, "lr": 1.793396697432839e-05, "epoch": 6.343283582089552, "percentage": 63.43, "elapsed_time": "0:21:15", "remaining_time": "0:12:15"}
|
102 |
+
{"current_steps": 900, "total_steps": 1340, "loss": 0.1495, "lr": 1.4879201121666467e-05, "epoch": 6.7164179104477615, "percentage": 67.16, "elapsed_time": "0:22:23", "remaining_time": "0:10:56"}
|
103 |
+
{"current_steps": 950, "total_steps": 1340, "loss": 0.1442, "lr": 1.1995888579364551e-05, "epoch": 7.08955223880597, "percentage": 70.9, "elapsed_time": "0:23:31", "remaining_time": "0:09:39"}
|
104 |
+
{"current_steps": 1000, "total_steps": 1340, "loss": 0.1243, "lr": 9.332874649668369e-06, "epoch": 7.462686567164179, "percentage": 74.63, "elapsed_time": "0:24:40", "remaining_time": "0:08:23"}
|
105 |
+
{"current_steps": 1050, "total_steps": 1340, "loss": 0.1277, "lr": 6.935272624450431e-06, "epoch": 7.835820895522388, "percentage": 78.36, "elapsed_time": "0:25:50", "remaining_time": "0:07:08"}
|
106 |
+
{"current_steps": 1100, "total_steps": 1340, "loss": 0.1099, "lr": 4.843699534944257e-06, "epoch": 8.208955223880597, "percentage": 82.09, "elapsed_time": "0:27:02", "remaining_time": "0:05:53"}
|
107 |
+
{"current_steps": 1150, "total_steps": 1340, "loss": 0.1104, "lr": 3.0935880712335773e-06, "epoch": 8.582089552238806, "percentage": 85.82, "elapsed_time": "0:28:09", "remaining_time": "0:04:39"}
|
108 |
+
{"current_steps": 1200, "total_steps": 1340, "loss": 0.1081, "lr": 1.7145863280547347e-06, "epoch": 8.955223880597014, "percentage": 89.55, "elapsed_time": "0:29:16", "remaining_time": "0:03:24"}
|
109 |
+
{"current_steps": 1250, "total_steps": 1340, "loss": 0.097, "lr": 7.300555456321883e-07, "epoch": 9.328358208955224, "percentage": 93.28, "elapsed_time": "0:30:28", "remaining_time": "0:02:11"}
|
110 |
+
{"current_steps": 1300, "total_steps": 1340, "loss": 0.1038, "lr": 1.5667435416370225e-07, "epoch": 9.701492537313433, "percentage": 97.01, "elapsed_time": "0:31:34", "remaining_time": "0:00:58"}
|
111 |
+
{"current_steps": 1340, "total_steps": 1340, "epoch": 10.0, "percentage": 100.0, "elapsed_time": "0:32:35", "remaining_time": "0:00:00"}
|
trainer_state.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:73bb59698d456038fad4c6f48d881aa394d9c9a7f826e06c0303557672a6a14a
|
3 |
+
size 5511
|
training_args.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b1d8fb8817be2be93bdc32d8f6327a185a8c3ccef26b750afbae7d1ca70b9e49
|
3 |
+
size 5752
|