Ministral-8B-Instruct-2410-dpo-mistral-1000

This model is a fine-tuned version of mistralai/Ministral-8B-Instruct-2410 on the answer_mistral dataset. It achieves the following results on the evaluation set:

Loss: 0.4603
Rewards/chosen: 0.6091
Rewards/rejected: -0.6645
Rewards/accuracies: 0.7700
Rewards/margins: 1.2736
Logps/chosen: -27.8114
Logps/rejected: -40.6013
Logits/chosen: -1.5222
Logits/rejected: -1.6375

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/chosen	Logps/rejected	Logits/chosen	Logits/rejected
0.6673	0.8909	50	0.6522	0.1120	0.0228	0.7100	0.0892	-32.7823	-33.7278	-1.9306	-1.9581
0.4332	1.7817	100	0.4815	0.6048	-0.1330	0.75	0.7378	-27.8545	-35.2862	-1.7231	-1.8002
0.4024	2.6726	150	0.4603	0.6091	-0.6645	0.7700	1.2736	-27.8114	-40.6013	-1.5222	-1.6375
0.3303	3.5635	200	0.4657	0.5792	-0.8599	0.7700	1.4391	-28.1105	-42.5552	-1.4494	-1.5647
0.3271	4.4543	250	0.4763	0.5191	-1.1649	0.7900	1.6840	-28.7117	-45.6052	-1.3643	-1.4784
0.2876	5.3452	300	0.4949	0.5526	-1.2031	0.7900	1.7557	-28.3769	-45.9875	-1.3337	-1.4407
0.1917	6.2361	350	0.5028	0.5230	-1.3047	0.8000	1.8278	-28.6720	-47.0036	-1.2966	-1.4062
0.2809	7.1269	400	0.4964	0.5832	-1.3130	0.8000	1.8962	-28.0704	-47.0858	-1.2846	-1.3927
0.1975	8.0178	450	0.5028	0.5658	-1.3484	0.8100	1.9142	-28.2443	-47.4404	-1.2803	-1.3872
0.2123	8.9087	500	0.5044	0.5590	-1.3822	0.8100	1.9412	-28.3127	-47.7785	-1.2744	-1.3800
0.2259	9.7996	550	0.5094	0.5423	-1.3976	0.8000	1.9400	-28.4790	-47.9327	-1.2714	-1.3782

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

chchen
/

Ministral-8B-Instruct-2410-dpo-mistral-1000

Ministral-8B-Instruct-2410-dpo-mistral-1000

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for chchen/Ministral-8B-Instruct-2410-dpo-mistral-1000

Evaluation results