tinyllama-1.1B-intermediate-step-715k-1.5T-dpo-lora-v4
This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.6904
- Rewards/chosen: -3.5271
- Rewards/rejected: -5.6475
- Rewards/accuracies: 0.7393
- Rewards/margins: 2.1205
- Logps/rejected: -394.1334
- Logps/chosen: -478.6117
- Logits/rejected: -3.8937
- Logits/chosen: -4.0184
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 32
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.02
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.5491 | 0.34 | 300 | 0.5719 | -0.5176 | -1.3357 | 0.7015 | 0.8181 | -351.0149 | -448.5167 | -4.0592 | -4.2257 |
0.5906 | 0.68 | 600 | 0.5625 | -0.3365 | -1.2779 | 0.7191 | 0.9414 | -350.4370 | -446.7061 | -4.0731 | -4.2239 |
0.2857 | 1.02 | 900 | 0.5723 | -0.3882 | -1.5979 | 0.7141 | 1.2097 | -353.6368 | -447.2226 | -4.0753 | -4.2332 |
0.2679 | 1.36 | 1200 | 0.5883 | -1.1630 | -2.3423 | 0.7234 | 1.1793 | -361.0811 | -454.9714 | -4.0115 | -4.1888 |
0.231 | 1.71 | 1500 | 0.5895 | -1.3278 | -2.7966 | 0.7338 | 1.4688 | -365.6242 | -456.6194 | -4.0069 | -4.1696 |
0.0862 | 2.05 | 1800 | 0.6626 | -2.7764 | -4.6708 | 0.7284 | 1.8944 | -384.3661 | -471.1047 | -3.9624 | -4.0992 |
0.0804 | 2.39 | 2100 | 0.6818 | -3.0330 | -5.1156 | 0.7410 | 2.0826 | -388.8140 | -473.6706 | -3.9128 | -4.0467 |
0.0925 | 2.73 | 2400 | 0.6947 | -3.5621 | -5.6537 | 0.7371 | 2.0916 | -394.1956 | -478.9623 | -3.8908 | -4.0137 |
Framework versions
- Transformers 4.35.0
- Pytorch 2.1.0+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1
- Downloads last month
- 3