silanm commited on
Commit
9c2eccf
·
verified ·
1 Parent(s): 3f17b9f

End of training

Browse files
Files changed (3) hide show
  1. README.md +80 -80
  2. model.safetensors +1 -1
  3. training_args.bin +1 -1
README.md CHANGED
@@ -1,80 +1,80 @@
1
- ---
2
- library_name: transformers
3
- base_model: gpt2
4
- tags:
5
- - trl
6
- - dpo
7
- - generated_from_trainer
8
- model-index:
9
- - name: results_orca_dpo_wandb
10
- results: []
11
- datasets:
12
- - argilla/distilabel-intel-orca-dpo-pairs
13
- language:
14
- - en
15
- metrics:
16
- - accuracy
17
- pipeline_tag: question-answering
18
- ---
19
-
20
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
21
- should probably proofread and complete it, then remove this comment. -->
22
-
23
- # results_orca_dpo_wandb
24
-
25
- This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
26
- It achieves the following results on the evaluation set:
27
- - Loss: 0.6150
28
- - Rewards/chosen: -0.2701
29
- - Rewards/rejected: -2.5585
30
- - Rewards/accuracies: 0.7940
31
- - Rewards/margins: 2.2885
32
- - Logps/rejected: -425.4867
33
- - Logps/chosen: -344.9728
34
- - Logits/rejected: -76.3682
35
- - Logits/chosen: -76.4329
36
-
37
- ## Model description
38
-
39
- More information needed
40
-
41
- ## Intended uses & limitations
42
-
43
- More information needed
44
-
45
- ## Training and evaluation data
46
-
47
- More information needed
48
-
49
- ## Training procedure
50
-
51
- ### Training hyperparameters
52
-
53
- The following hyperparameters were used during training:
54
- - learning_rate: 0.0001
55
- - train_batch_size: 4
56
- - eval_batch_size: 8
57
- - seed: 42
58
- - gradient_accumulation_steps: 2
59
- - total_train_batch_size: 8
60
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
61
- - lr_scheduler_type: linear
62
- - lr_scheduler_warmup_steps: 50
63
- - training_steps: 200
64
-
65
- ### Training results
66
-
67
- | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
68
- |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
69
- | 0.9158 | 0.0346 | 50 | 0.7779 | -1.2673 | -3.1650 | 0.7319 | 1.8977 | -431.5514 | -354.9452 | -99.3161 | -98.3339 |
70
- | 1.2481 | 0.0691 | 100 | 0.9942 | -3.1400 | -6.5742 | 0.7368 | 3.4342 | -465.6436 | -373.6723 | -86.8154 | -86.6002 |
71
- | 0.6814 | 0.1037 | 150 | 0.7237 | -0.3674 | -2.6648 | 0.7635 | 2.2974 | -426.5488 | -345.9457 | -75.5469 | -75.8445 |
72
- | 0.6615 | 0.1382 | 200 | 0.6150 | -0.2701 | -2.5585 | 0.7940 | 2.2885 | -425.4867 | -344.9728 | -76.3682 | -76.4329 |
73
-
74
-
75
- ### Framework versions
76
-
77
- - Transformers 4.45.0
78
- - Pytorch 2.4.0+cu124
79
- - Datasets 3.2.0
80
- - Tokenizers 0.20.3
 
1
+ ---
2
+ library_name: transformers
3
+ license: mit
4
+ base_model: gpt2
5
+ tags:
6
+ - trl
7
+ - dpo
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: nlp-a5
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # nlp-a5
18
+
19
+ This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.6409
22
+ - Rewards/chosen: 0.9778
23
+ - Rewards/rejected: -2.1491
24
+ - Rewards/accuracies: 0.8235
25
+ - Rewards/margins: 3.1270
26
+ - Logps/rejected: -410.6469
27
+ - Logps/chosen: -337.3829
28
+ - Logits/rejected: -66.9816
29
+ - Logits/chosen: -67.8481
30
+
31
+ ## Model description
32
+
33
+ More information needed
34
+
35
+ ## Intended uses & limitations
36
+
37
+ More information needed
38
+
39
+ ## Training and evaluation data
40
+
41
+ More information needed
42
+
43
+ ## Training procedure
44
+
45
+ ### Training hyperparameters
46
+
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 5.38e-05
49
+ - train_batch_size: 8
50
+ - eval_batch_size: 8
51
+ - seed: 42
52
+ - gradient_accumulation_steps: 4
53
+ - total_train_batch_size: 32
54
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
+ - lr_scheduler_type: linear
56
+ - lr_scheduler_warmup_steps: 50
57
+ - training_steps: 500
58
+
59
+ ### Training results
60
+
61
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
+ | 0.6454 | 0.1382 | 50 | 0.7701 | 0.4667 | -1.3878 | 0.7591 | 1.8546 | -406.8403 | -339.9385 | -95.7163 | -95.3393 |
64
+ | 0.7265 | 0.2764 | 100 | 0.7531 | 0.2791 | -2.1548 | 0.7777 | 2.4339 | -410.6752 | -340.8765 | -85.4456 | -85.2691 |
65
+ | 0.5317 | 0.4147 | 150 | 0.7164 | 0.0401 | -2.6230 | 0.7743 | 2.6631 | -413.0164 | -342.0717 | -77.7900 | -78.4781 |
66
+ | 0.8947 | 0.5529 | 200 | 0.7223 | -0.0327 | -3.1585 | 0.7961 | 3.1258 | -415.6938 | -342.4356 | -73.7223 | -74.3845 |
67
+ | 0.6882 | 0.6911 | 250 | 0.6677 | 0.6186 | -2.0402 | 0.7904 | 2.6588 | -410.1023 | -339.1790 | -66.4183 | -67.2267 |
68
+ | 0.4596 | 0.8293 | 300 | 0.6199 | 0.5863 | -2.4937 | 0.8116 | 3.0800 | -412.3698 | -339.3405 | -66.5151 | -67.2825 |
69
+ | 0.6719 | 0.9675 | 350 | 0.6214 | 1.1018 | -1.4390 | 0.7842 | 2.5408 | -407.0965 | -336.7633 | -64.9415 | -65.8130 |
70
+ | 0.119 | 1.1057 | 400 | 0.6442 | 0.4069 | -2.8694 | 0.8282 | 3.2763 | -414.2482 | -340.2375 | -64.6611 | -65.4554 |
71
+ | 0.1427 | 1.2440 | 450 | 0.6730 | 1.1133 | -1.9897 | 0.8131 | 3.1030 | -409.8499 | -336.7056 | -65.8348 | -66.7287 |
72
+ | 0.1022 | 1.3822 | 500 | 0.6409 | 0.9778 | -2.1491 | 0.8235 | 3.1270 | -410.6469 | -337.3829 | -66.9816 | -67.8481 |
73
+
74
+
75
+ ### Framework versions
76
+
77
+ - Transformers 4.45.0
78
+ - Pytorch 2.4.0+cu124
79
+ - Datasets 3.2.0
80
+ - Tokenizers 0.20.3
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4551f55154fd9a37b198bafe2fb7c78ad3f95ceead94d91f9db36f3ec335b9f9
3
  size 497774208
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a1e16b123564dc528a9ce6eaeb660b75bfeeb6232c6795e79859ce272d72931d
3
  size 497774208
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fa18c008df102c7b6aa7f2a6644bb25a404d3c2cf749678f40d55277cfaadac4
3
  size 5176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c744629b47dbf6cb600bb728b569cde2ab72eb172d814be8a42decb6f71ff528
3
  size 5176