NanQiangHF
/

llama3_8b_instruct_dpo_bwgenerator

Generated from Trainer

Model card Files Files and versions Community

llama3_8b_instruct_dpo_bwgenerator / README.md

NanQiangHF's picture

NanQiangHF/llama3_8b_instruct_dpo_bwgenerator

82a5799 verified 7 months ago

|

history blame contribute delete

4.39 kB

	---
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: NanQiangHF/llama3_8b_instruct_bwgenerator
	model-index:
	- name: llama3_8b_instruct_dpo_bwgenerator
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# llama3_8b_instruct_dpo_bwgenerator

	This model is a fine-tuned version of [NanQiangHF/llama3_8b_instruct_bwgenerator](https://huggingface.co/NanQiangHF/llama3_8b_instruct_bwgenerator) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0706
	- Rewards/chosen: -4.6241
	- Rewards/rejected: -14.8342
	- Rewards/accuracies: 0.9780
	- Rewards/margins: 10.2101
	- Logps/rejected: -216.1456
	- Logps/chosen: -84.8191
	- Logits/rejected: 0.9202
	- Logits/chosen: 0.3552

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.247 \| 0.0719 \| 1000 \| 0.0906 \| -3.7216 \| -11.8877 \| 0.9686 \| 8.1662 \| -186.6814 \| -75.7941 \| 0.8504 \| 0.3080 \|
	\| 0.083 \| 0.1438 \| 2000 \| 0.0775 \| -4.5564 \| -14.1375 \| 0.9764 \| 9.5811 \| -209.1791 \| -84.1423 \| 0.8989 \| 0.3418 \|
	\| 0.0623 \| 0.2157 \| 3000 \| 0.0734 \| -4.5379 \| -14.4993 \| 0.9770 \| 9.9614 \| -212.7973 \| -83.9572 \| 0.9082 \| 0.3471 \|
	\| 0.069 \| 0.2876 \| 4000 \| 0.0713 \| -4.5601 \| -14.6450 \| 0.9777 \| 10.0850 \| -214.2546 \| -84.1790 \| 0.9145 \| 0.3514 \|
	\| 0.0752 \| 0.3595 \| 5000 \| 0.0706 \| -4.4918 \| -14.6244 \| 0.9793 \| 10.1326 \| -214.0477 \| -83.4960 \| 0.9181 \| 0.3533 \|
	\| 0.0723 \| 0.4313 \| 6000 \| 0.0710 \| -4.6381 \| -14.8167 \| 0.9780 \| 10.1787 \| -215.9714 \| -84.9590 \| 0.9187 \| 0.3542 \|
	\| 0.0852 \| 0.5032 \| 7000 \| 0.0705 \| -4.6251 \| -14.8143 \| 0.9783 \| 10.1893 \| -215.9474 \| -84.8290 \| 0.9189 \| 0.3542 \|
	\| 0.0811 \| 0.5751 \| 8000 \| 0.0706 \| -4.6409 \| -14.8406 \| 0.9780 \| 10.1997 \| -216.2102 \| -84.9870 \| 0.9185 \| 0.3538 \|
	\| 0.0762 \| 0.6470 \| 9000 \| 0.0699 \| -4.6161 \| -14.8083 \| 0.9790 \| 10.1921 \| -215.8869 \| -84.7398 \| 0.9186 \| 0.3541 \|
	\| 0.0686 \| 0.7189 \| 10000 \| 0.0703 \| -4.6164 \| -14.8042 \| 0.9790 \| 10.1878 \| -215.8462 \| -84.7421 \| 0.9185 \| 0.3537 \|
	\| 0.061 \| 0.7908 \| 11000 \| 0.0705 \| -4.6191 \| -14.8169 \| 0.9793 \| 10.1977 \| -215.9726 \| -84.7695 \| 0.9207 \| 0.3556 \|
	\| 0.0786 \| 0.8627 \| 12000 \| 0.0698 \| -4.6080 \| -14.7978 \| 0.9793 \| 10.1898 \| -215.7822 \| -84.6584 \| 0.9195 \| 0.3546 \|
	\| 0.073 \| 0.9346 \| 13000 \| 0.0706 \| -4.6241 \| -14.8342 \| 0.9780 \| 10.2101 \| -216.1456 \| -84.8191 \| 0.9202 \| 0.3552 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.44.0
	- Pytorch 2.3.0+cu121
	- Datasets 2.14.7
	- Tokenizers 0.19.1