Edit model card

dolly-v2-7b-dpo-full-1-epoch-hydrox-safe

This model is a fine-tuned version of databricks/dolly-v2-7b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0371
  • Rewards/chosen: 4.2799
  • Rewards/rejected: -3.8888
  • Rewards/accuracies: 0.9857
  • Rewards/margins: 8.1686
  • Logps/rejected: -598.4040
  • Logps/chosen: -377.1240
  • Logits/rejected: -1.2002
  • Logits/chosen: -1.5171

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.618 0.03 100 0.5642 0.6988 -0.1139 0.7424 0.8127 -560.6550 -412.9344 -1.1894 -1.4878
0.3539 0.07 200 0.3197 1.9159 -0.2730 0.8847 2.1889 -562.2463 -400.7641 -1.1625 -1.4800
0.2287 0.1 300 0.2128 2.8057 -0.5539 0.9200 3.3596 -565.0551 -391.8654 -1.1361 -1.4649
0.158 0.14 400 0.1673 3.4556 -1.0339 0.9327 4.4895 -569.8558 -385.3670 -1.1300 -1.4622
0.1599 0.17 500 0.1397 3.7485 -1.3338 0.9461 5.0823 -572.8546 -382.4376 -1.1275 -1.4607
0.1389 0.2 600 0.1273 3.9259 -1.5111 0.9529 5.4371 -574.6277 -380.6633 -1.1194 -1.4519
0.0778 0.24 700 0.1122 4.0699 -1.8498 0.9613 5.9197 -578.0140 -379.2233 -1.1302 -1.4542
0.0993 0.27 800 0.0975 4.2423 -1.9934 0.9663 6.2357 -579.4506 -377.5001 -1.1424 -1.4689
0.111 0.31 900 0.0907 4.3218 -2.2534 0.9697 6.5752 -582.0501 -376.7048 -1.1542 -1.4820
0.0893 0.34 1000 0.0882 4.3878 -2.2588 0.9663 6.6466 -582.1047 -376.0451 -1.1497 -1.4694
0.079 0.37 1100 0.0840 4.4706 -2.3132 0.9689 6.7838 -582.6481 -375.2164 -1.1532 -1.4807
0.0706 0.41 1200 0.0721 4.4319 -2.6505 0.9722 7.0824 -586.0217 -375.6038 -1.1667 -1.4885
0.0705 0.44 1300 0.0725 4.3743 -2.8717 0.9739 7.2460 -588.2330 -376.1799 -1.1817 -1.5001
0.0537 0.48 1400 0.0648 4.3847 -2.9676 0.9756 7.3523 -589.1927 -376.0760 -1.1789 -1.5019
0.0483 0.51 1500 0.0604 4.3761 -3.2295 0.9798 7.6056 -591.8114 -376.1613 -1.1923 -1.5114
0.0572 0.54 1600 0.0581 4.3258 -3.2641 0.9773 7.5899 -592.1575 -376.6645 -1.1855 -1.5042
0.066 0.58 1700 0.0539 4.3270 -3.3813 0.9815 7.7083 -593.3289 -376.6523 -1.1886 -1.5110
0.0561 0.61 1800 0.0501 4.3859 -3.3980 0.9798 7.7839 -593.4964 -376.0636 -1.1948 -1.5144
0.0538 0.65 1900 0.0504 4.4209 -3.4478 0.9815 7.8687 -593.9944 -375.7137 -1.2036 -1.5147
0.0493 0.68 2000 0.0472 4.3835 -3.5804 0.9832 7.9639 -595.3203 -376.0873 -1.1925 -1.5071
0.0374 0.71 2100 0.0449 4.2972 -3.7998 0.9840 8.0970 -597.5147 -376.9510 -1.2020 -1.5166
0.0475 0.75 2200 0.0442 4.3073 -3.6486 0.9840 7.9559 -596.0024 -376.8494 -1.1992 -1.5177
0.0407 0.78 2300 0.0408 4.3011 -3.7981 0.9882 8.0992 -597.4978 -376.9122 -1.2078 -1.5242
0.0386 0.82 2400 0.0397 4.3423 -3.7314 0.9882 8.0737 -596.8302 -376.4996 -1.2029 -1.5133
0.0504 0.85 2500 0.0390 4.3732 -3.7690 0.9857 8.1422 -597.2065 -376.1912 -1.2024 -1.5188
0.0402 0.88 2600 0.0377 4.3358 -3.8299 0.9865 8.1656 -597.8150 -376.5649 -1.1977 -1.5158
0.038 0.92 2700 0.0397 4.3284 -3.8383 0.9891 8.1667 -597.8990 -376.6386 -1.2033 -1.5139
0.0527 0.95 2800 0.0383 4.2985 -3.8490 0.9857 8.1475 -598.0059 -376.9374 -1.2037 -1.5196
0.0365 0.99 2900 0.0379 4.3086 -3.8349 0.9874 8.1435 -597.8653 -376.8369 -1.1997 -1.5156

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
5
Safetensors
Model size
6.86B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for yihang7/dolly-v2-7b-dpo-full-1-epoch-hydrox-safe

Finetuned
this model