Edit model card

llama3_8b_instruct_dpo_bwgenerator_v2

This model is a fine-tuned version of NanQiangHF/llama3_8b_instruct_bwgenerator on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3494
  • Rewards/chosen: -0.5156
  • Rewards/rejected: -2.0278
  • Rewards/accuracies: 0.8713
  • Rewards/margins: 1.5122
  • Logps/rejected: -88.0817
  • Logps/chosen: -43.7343
  • Logits/rejected: 0.7079
  • Logits/chosen: 0.1945

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5461 0.0719 1000 0.4574 -0.0823 -0.9594 0.8261 0.8771 -77.3979 -39.4010 0.6931 0.1837
0.426 0.1438 2000 0.3856 -0.3308 -1.6338 0.8454 1.3030 -84.1417 -41.8860 0.7041 0.1914
0.3758 0.2157 3000 0.3593 -0.4540 -1.9108 0.8652 1.4567 -86.9117 -43.1185 0.7065 0.1933
0.3611 0.2876 4000 0.3515 -0.5039 -2.0063 0.8687 1.5024 -87.8675 -43.6177 0.7088 0.1952
0.3438 0.3595 5000 0.3502 -0.5107 -2.0200 0.8681 1.5093 -88.0041 -43.6858 0.7085 0.1951
0.357 0.4313 6000 0.3487 -0.5159 -2.0325 0.8668 1.5166 -88.1288 -43.7373 0.7092 0.1955
0.3562 0.5032 7000 0.3496 -0.5151 -2.0278 0.8707 1.5127 -88.0820 -43.7290 0.7093 0.1956
0.3597 0.5751 8000 0.3493 -0.5179 -2.0304 0.8707 1.5125 -88.1081 -43.7570 0.7092 0.1956
0.3437 0.6470 9000 0.3492 -0.5132 -2.0264 0.8691 1.5132 -88.0680 -43.7105 0.7109 0.1971
0.3544 0.7189 10000 0.3488 -0.5160 -2.0301 0.8704 1.5142 -88.1054 -43.7379 0.7089 0.1953
0.3451 0.7908 11000 0.3498 -0.5116 -2.0235 0.8694 1.5119 -88.0395 -43.6945 0.7089 0.1951
0.3543 0.8627 12000 0.3485 -0.5155 -2.0306 0.8687 1.5151 -88.1099 -43.7334 0.7091 0.1955
0.3609 0.9346 13000 0.3494 -0.5156 -2.0278 0.8713 1.5122 -88.0817 -43.7343 0.7079 0.1945

Framework versions

  • PEFT 0.10.0
  • Transformers 4.44.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for NanQiangHF/llama3_8b_instruct_dpo_bwgenerator_v2

Adapter
(2)
this model