silanm commited on
Commit
3f17b9f
·
verified ·
1 Parent(s): 70e6781

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -74
README.md CHANGED
@@ -1,74 +1,80 @@
1
- ---
2
- library_name: transformers
3
- license: mit
4
- base_model: gpt2
5
- tags:
6
- - trl
7
- - dpo
8
- - generated_from_trainer
9
- model-index:
10
- - name: results_orca_dpo_wandb
11
- results: []
12
- ---
13
-
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
- # results_orca_dpo_wandb
18
-
19
- This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 0.6150
22
- - Rewards/chosen: -0.2701
23
- - Rewards/rejected: -2.5585
24
- - Rewards/accuracies: 0.7940
25
- - Rewards/margins: 2.2885
26
- - Logps/rejected: -425.4867
27
- - Logps/chosen: -344.9728
28
- - Logits/rejected: -76.3682
29
- - Logits/chosen: -76.4329
30
-
31
- ## Model description
32
-
33
- More information needed
34
-
35
- ## Intended uses & limitations
36
-
37
- More information needed
38
-
39
- ## Training and evaluation data
40
-
41
- More information needed
42
-
43
- ## Training procedure
44
-
45
- ### Training hyperparameters
46
-
47
- The following hyperparameters were used during training:
48
- - learning_rate: 0.0001
49
- - train_batch_size: 4
50
- - eval_batch_size: 8
51
- - seed: 42
52
- - gradient_accumulation_steps: 2
53
- - total_train_batch_size: 8
54
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
- - lr_scheduler_type: linear
56
- - lr_scheduler_warmup_steps: 50
57
- - training_steps: 200
58
-
59
- ### Training results
60
-
61
- | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
- |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
- | 0.9158 | 0.0346 | 50 | 0.7779 | -1.2673 | -3.1650 | 0.7319 | 1.8977 | -431.5514 | -354.9452 | -99.3161 | -98.3339 |
64
- | 1.2481 | 0.0691 | 100 | 0.9942 | -3.1400 | -6.5742 | 0.7368 | 3.4342 | -465.6436 | -373.6723 | -86.8154 | -86.6002 |
65
- | 0.6814 | 0.1037 | 150 | 0.7237 | -0.3674 | -2.6648 | 0.7635 | 2.2974 | -426.5488 | -345.9457 | -75.5469 | -75.8445 |
66
- | 0.6615 | 0.1382 | 200 | 0.6150 | -0.2701 | -2.5585 | 0.7940 | 2.2885 | -425.4867 | -344.9728 | -76.3682 | -76.4329 |
67
-
68
-
69
- ### Framework versions
70
-
71
- - Transformers 4.45.0
72
- - Pytorch 2.4.0+cu124
73
- - Datasets 3.2.0
74
- - Tokenizers 0.20.3
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ base_model: gpt2
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: results_orca_dpo_wandb
10
+ results: []
11
+ datasets:
12
+ - argilla/distilabel-intel-orca-dpo-pairs
13
+ language:
14
+ - en
15
+ metrics:
16
+ - accuracy
17
+ pipeline_tag: question-answering
18
+ ---
19
+
20
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
21
+ should probably proofread and complete it, then remove this comment. -->
22
+
23
+ # results_orca_dpo_wandb
24
+
25
+ This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
26
+ It achieves the following results on the evaluation set:
27
+ - Loss: 0.6150
28
+ - Rewards/chosen: -0.2701
29
+ - Rewards/rejected: -2.5585
30
+ - Rewards/accuracies: 0.7940
31
+ - Rewards/margins: 2.2885
32
+ - Logps/rejected: -425.4867
33
+ - Logps/chosen: -344.9728
34
+ - Logits/rejected: -76.3682
35
+ - Logits/chosen: -76.4329
36
+
37
+ ## Model description
38
+
39
+ More information needed
40
+
41
+ ## Intended uses & limitations
42
+
43
+ More information needed
44
+
45
+ ## Training and evaluation data
46
+
47
+ More information needed
48
+
49
+ ## Training procedure
50
+
51
+ ### Training hyperparameters
52
+
53
+ The following hyperparameters were used during training:
54
+ - learning_rate: 0.0001
55
+ - train_batch_size: 4
56
+ - eval_batch_size: 8
57
+ - seed: 42
58
+ - gradient_accumulation_steps: 2
59
+ - total_train_batch_size: 8
60
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
61
+ - lr_scheduler_type: linear
62
+ - lr_scheduler_warmup_steps: 50
63
+ - training_steps: 200
64
+
65
+ ### Training results
66
+
67
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
68
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
69
+ | 0.9158 | 0.0346 | 50 | 0.7779 | -1.2673 | -3.1650 | 0.7319 | 1.8977 | -431.5514 | -354.9452 | -99.3161 | -98.3339 |
70
+ | 1.2481 | 0.0691 | 100 | 0.9942 | -3.1400 | -6.5742 | 0.7368 | 3.4342 | -465.6436 | -373.6723 | -86.8154 | -86.6002 |
71
+ | 0.6814 | 0.1037 | 150 | 0.7237 | -0.3674 | -2.6648 | 0.7635 | 2.2974 | -426.5488 | -345.9457 | -75.5469 | -75.8445 |
72
+ | 0.6615 | 0.1382 | 200 | 0.6150 | -0.2701 | -2.5585 | 0.7940 | 2.2885 | -425.4867 | -344.9728 | -76.3682 | -76.4329 |
73
+
74
+
75
+ ### Framework versions
76
+
77
+ - Transformers 4.45.0
78
+ - Pytorch 2.4.0+cu124
79
+ - Datasets 3.2.0
80
+ - Tokenizers 0.20.3