---
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: NanQiangHF/llama3_8b_instruct_bwgenerator
model-index:
- name: llama3_8b_instruct_dpo_bwgenerator
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# llama3_8b_instruct_dpo_bwgenerator

This model is a fine-tuned version of [NanQiangHF/llama3_8b_instruct_bwgenerator](https://huggingface.co/NanQiangHF/llama3_8b_instruct_bwgenerator) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0706
- Rewards/chosen: -4.6241
- Rewards/rejected: -14.8342
- Rewards/accuracies: 0.9780
- Rewards/margins: 10.2101
- Logps/rejected: -216.1456
- Logps/chosen: -84.8191
- Logits/rejected: 0.9202
- Logits/chosen: 0.3552

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step  | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.247         | 0.0719 | 1000  | 0.0906          | -3.7216        | -11.8877         | 0.9686             | 8.1662          | -186.6814      | -75.7941     | 0.8504          | 0.3080        |
| 0.083         | 0.1438 | 2000  | 0.0775          | -4.5564        | -14.1375         | 0.9764             | 9.5811          | -209.1791      | -84.1423     | 0.8989          | 0.3418        |
| 0.0623        | 0.2157 | 3000  | 0.0734          | -4.5379        | -14.4993         | 0.9770             | 9.9614          | -212.7973      | -83.9572     | 0.9082          | 0.3471        |
| 0.069         | 0.2876 | 4000  | 0.0713          | -4.5601        | -14.6450         | 0.9777             | 10.0850         | -214.2546      | -84.1790     | 0.9145          | 0.3514        |
| 0.0752        | 0.3595 | 5000  | 0.0706          | -4.4918        | -14.6244         | 0.9793             | 10.1326         | -214.0477      | -83.4960     | 0.9181          | 0.3533        |
| 0.0723        | 0.4313 | 6000  | 0.0710          | -4.6381        | -14.8167         | 0.9780             | 10.1787         | -215.9714      | -84.9590     | 0.9187          | 0.3542        |
| 0.0852        | 0.5032 | 7000  | 0.0705          | -4.6251        | -14.8143         | 0.9783             | 10.1893         | -215.9474      | -84.8290     | 0.9189          | 0.3542        |
| 0.0811        | 0.5751 | 8000  | 0.0706          | -4.6409        | -14.8406         | 0.9780             | 10.1997         | -216.2102      | -84.9870     | 0.9185          | 0.3538        |
| 0.0762        | 0.6470 | 9000  | 0.0699          | -4.6161        | -14.8083         | 0.9790             | 10.1921         | -215.8869      | -84.7398     | 0.9186          | 0.3541        |
| 0.0686        | 0.7189 | 10000 | 0.0703          | -4.6164        | -14.8042         | 0.9790             | 10.1878         | -215.8462      | -84.7421     | 0.9185          | 0.3537        |
| 0.061         | 0.7908 | 11000 | 0.0705          | -4.6191        | -14.8169         | 0.9793             | 10.1977         | -215.9726      | -84.7695     | 0.9207          | 0.3556        |
| 0.0786        | 0.8627 | 12000 | 0.0698          | -4.6080        | -14.7978         | 0.9793             | 10.1898         | -215.7822      | -84.6584     | 0.9195          | 0.3546        |
| 0.073         | 0.9346 | 13000 | 0.0706          | -4.6241        | -14.8342         | 0.9780             | 10.2101         | -216.1456      | -84.8191     | 0.9202          | 0.3552        |


### Framework versions

- PEFT 0.10.0
- Transformers 4.44.0
- Pytorch 2.3.0+cu121
- Datasets 2.14.7
- Tokenizers 0.19.1