File size: 7,275 Bytes
020f91b
 
 
 
 
7a8c06f
020f91b
 
 
 
 
 
 
 
 
 
 
 
 
c70833e
 
 
 
 
 
 
 
 
020f91b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c70833e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
020f91b
 
 
 
6676f5b
020f91b
6676f5b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
library_name: transformers
tags:
- trl
- dpo
- alignment-handbook
- generated_from_trainer
model-index:
- name: OpenELM-1_1B-DPO-full-max-10-reward
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# OpenELM-1_1B-DPO-full-max-10-reward

This model was trained from scratch on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.5022
- Rewards/chosen: -12.0
- Rewards/rejected: -14.25
- Rewards/accuracies: 0.5996
- Rewards/margins: 2.2188
- Logps/rejected: -1712.0
- Logps/chosen: -1520.0
- Logits/rejected: -1.7422
- Logits/chosen: -3.6875

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.3683        | 0.1047 | 100  | 0.6799          | -1.8125        | -2.1719          | 0.6016             | 0.3555          | -506.0         | -500.0       | -12.125         | -12.4375      |
| 0.2926        | 0.2094 | 200  | 0.7127          | -2.0312        | -2.5156          | 0.6152             | 0.4863          | -540.0         | -520.0       | -10.0           | -10.5625      |
| 0.2695        | 0.3141 | 300  | 0.7960          | -4.5938        | -5.1562          | 0.5801             | 0.5781          | -804.0         | -776.0       | -7.2812         | -8.1875       |
| 0.245         | 0.4188 | 400  | 0.7903          | -4.6562        | -5.25            | 0.5801             | 0.5977          | -812.0         | -784.0       | -8.75           | -9.5625       |
| 0.2375        | 0.5236 | 500  | 0.9612          | -6.75          | -7.875           | 0.6113             | 1.125           | -1080.0        | -992.0       | -7.4688         | -8.6875       |
| 0.2534        | 0.6283 | 600  | 0.8573          | -5.6562        | -6.5             | 0.6133             | 0.8438          | -940.0         | -884.0       | -8.75           | -9.6875       |
| 0.2213        | 0.7330 | 700  | 0.8133          | -4.7812        | -5.7188          | 0.6387             | 0.9336          | -860.0         | -796.0       | -5.75           | -7.3125       |
| 0.2342        | 0.8377 | 800  | 0.8574          | -5.5625        | -6.4688          | 0.6055             | 0.9336          | -936.0         | -872.0       | -7.2812         | -8.5625       |
| 0.199         | 0.9424 | 900  | 0.8853          | -7.1875        | -8.1875          | 0.6074             | 0.9570          | -1104.0        | -1040.0      | -4.6562         | -6.0938       |
| 0.0529        | 1.0471 | 1000 | 1.1147          | -8.5           | -9.75            | 0.6055             | 1.2734          | -1264.0        | -1168.0      | -4.4062         | -6.2188       |
| 0.058         | 1.1518 | 1100 | 1.0443          | -6.25          | -7.25            | 0.5977             | 1.0             | -1012.0        | -940.0       | -7.9375         | -9.1875       |
| 0.0436        | 1.2565 | 1200 | 1.1756          | -9.5625        | -10.875          | 0.6133             | 1.3438          | -1376.0        | -1272.0      | -1.3125         | -3.0938       |
| 0.0353        | 1.3613 | 1300 | 1.2987          | -8.75          | -10.4375         | 0.5859             | 1.6875          | -1328.0        | -1192.0      | -5.2812         | -7.0625       |
| 0.0576        | 1.4660 | 1400 | 1.0486          | -8.0625        | -9.5625          | 0.6172             | 1.4609          | -1240.0        | -1128.0      | -4.625          | -6.4688       |
| 0.0444        | 1.5707 | 1500 | 1.1459          | -8.875         | -10.5            | 0.6113             | 1.6484          | -1344.0        | -1208.0      | -1.9141         | -3.9219       |
| 0.0475        | 1.6754 | 1600 | 1.1818          | -8.5625        | -10.125          | 0.5918             | 1.5547          | -1304.0        | -1176.0      | -2.5938         | -4.5625       |
| 0.0644        | 1.7801 | 1700 | 1.2222          | -9.625         | -11.25           | 0.6055             | 1.6562          | -1416.0        | -1280.0      | -2.7344         | -4.5938       |
| 0.0397        | 1.8848 | 1800 | 1.0832          | -7.8125        | -9.375           | 0.6172             | 1.5469          | -1224.0        | -1096.0      | -3.3438         | -5.375        |
| 0.0254        | 1.9895 | 1900 | 1.1882          | -9.8125        | -11.4375         | 0.6191             | 1.6719          | -1432.0        | -1296.0      | -3.7344         | -5.4688       |
| 0.0037        | 2.0942 | 2000 | 1.3353          | -11.125        | -13.125          | 0.6133             | 1.9766          | -1600.0        | -1432.0      | -2.5938         | -4.5312       |
| 0.0048        | 2.1990 | 2100 | 1.5185          | -12.1875       | -14.375          | 0.5996             | 2.2031          | -1728.0        | -1536.0      | -2.7656         | -4.7188       |
| 0.0045        | 2.3037 | 2200 | 1.5012          | -12.4375       | -14.625          | 0.6133             | 2.1875          | -1752.0        | -1560.0      | -1.75           | -3.6406       |
| 0.0108        | 2.4084 | 2300 | 1.5281          | -12.3125       | -14.5625         | 0.6074             | 2.2344          | -1744.0        | -1552.0      | -1.8047         | -3.75         |
| 0.0056        | 2.5131 | 2400 | 1.5154          | -12.125        | -14.3125         | 0.6074             | 2.2188          | -1720.0        | -1528.0      | -1.6797         | -3.625        |
| 0.0051        | 2.6178 | 2500 | 1.5115          | -12.1875       | -14.4375         | 0.6035             | 2.2188          | -1728.0        | -1536.0      | -1.5234         | -3.4531       |
| 0.0041        | 2.7225 | 2600 | 1.4846          | -11.8125       | -14.0625         | 0.5938             | 2.2031          | -1696.0        | -1504.0      | -1.8047         | -3.75         |
| 0.0049        | 2.8272 | 2700 | 1.5020          | -12.0          | -14.25           | 0.5977             | 2.2344          | -1712.0        | -1520.0      | -1.7266         | -3.6719       |
| 0.0063        | 2.9319 | 2800 | 1.5022          | -12.0          | -14.25           | 0.5996             | 2.2188          | -1712.0        | -1520.0      | -1.7422         | -3.6875       |


### Framework versions

- Transformers 4.45.1
- Pytorch 2.3.0
- Datasets 3.0.1
- Tokenizers 0.20.0