Ministral-8B-Instruct-2410-dpo-mistral-1000

This model is a fine-tuned version of mistralai/Ministral-8B-Instruct-2410 on the answer_mistral dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4603
  • Rewards/chosen: 0.6091
  • Rewards/rejected: -0.6645
  • Rewards/accuracies: 0.7700
  • Rewards/margins: 1.2736
  • Logps/chosen: -27.8114
  • Logps/rejected: -40.6013
  • Logits/chosen: -1.5222
  • Logits/rejected: -1.6375

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
0.6673 0.8909 50 0.6522 0.1120 0.0228 0.7100 0.0892 -32.7823 -33.7278 -1.9306 -1.9581
0.4332 1.7817 100 0.4815 0.6048 -0.1330 0.75 0.7378 -27.8545 -35.2862 -1.7231 -1.8002
0.4024 2.6726 150 0.4603 0.6091 -0.6645 0.7700 1.2736 -27.8114 -40.6013 -1.5222 -1.6375
0.3303 3.5635 200 0.4657 0.5792 -0.8599 0.7700 1.4391 -28.1105 -42.5552 -1.4494 -1.5647
0.3271 4.4543 250 0.4763 0.5191 -1.1649 0.7900 1.6840 -28.7117 -45.6052 -1.3643 -1.4784
0.2876 5.3452 300 0.4949 0.5526 -1.2031 0.7900 1.7557 -28.3769 -45.9875 -1.3337 -1.4407
0.1917 6.2361 350 0.5028 0.5230 -1.3047 0.8000 1.8278 -28.6720 -47.0036 -1.2966 -1.4062
0.2809 7.1269 400 0.4964 0.5832 -1.3130 0.8000 1.8962 -28.0704 -47.0858 -1.2846 -1.3927
0.1975 8.0178 450 0.5028 0.5658 -1.3484 0.8100 1.9142 -28.2443 -47.4404 -1.2803 -1.3872
0.2123 8.9087 500 0.5044 0.5590 -1.3822 0.8100 1.9412 -28.3127 -47.7785 -1.2744 -1.3800
0.2259 9.7996 550 0.5094 0.5423 -1.3976 0.8000 1.9400 -28.4790 -47.9327 -1.2714 -1.3782

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for chchen/Ministral-8B-Instruct-2410-dpo-mistral-1000

Adapter
(67)
this model