Ministral-8B-Instruct-2410-dpo-mistral-1000
This model is a fine-tuned version of mistralai/Ministral-8B-Instruct-2410 on the answer_mistral dataset. It achieves the following results on the evaluation set:
- Loss: 0.4603
- Rewards/chosen: 0.6091
- Rewards/rejected: -0.6645
- Rewards/accuracies: 0.7700
- Rewards/margins: 1.2736
- Logps/chosen: -27.8114
- Logps/rejected: -40.6013
- Logits/chosen: -1.5222
- Logits/rejected: -1.6375
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/chosen | Logps/rejected | Logits/chosen | Logits/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6673 | 0.8909 | 50 | 0.6522 | 0.1120 | 0.0228 | 0.7100 | 0.0892 | -32.7823 | -33.7278 | -1.9306 | -1.9581 |
0.4332 | 1.7817 | 100 | 0.4815 | 0.6048 | -0.1330 | 0.75 | 0.7378 | -27.8545 | -35.2862 | -1.7231 | -1.8002 |
0.4024 | 2.6726 | 150 | 0.4603 | 0.6091 | -0.6645 | 0.7700 | 1.2736 | -27.8114 | -40.6013 | -1.5222 | -1.6375 |
0.3303 | 3.5635 | 200 | 0.4657 | 0.5792 | -0.8599 | 0.7700 | 1.4391 | -28.1105 | -42.5552 | -1.4494 | -1.5647 |
0.3271 | 4.4543 | 250 | 0.4763 | 0.5191 | -1.1649 | 0.7900 | 1.6840 | -28.7117 | -45.6052 | -1.3643 | -1.4784 |
0.2876 | 5.3452 | 300 | 0.4949 | 0.5526 | -1.2031 | 0.7900 | 1.7557 | -28.3769 | -45.9875 | -1.3337 | -1.4407 |
0.1917 | 6.2361 | 350 | 0.5028 | 0.5230 | -1.3047 | 0.8000 | 1.8278 | -28.6720 | -47.0036 | -1.2966 | -1.4062 |
0.2809 | 7.1269 | 400 | 0.4964 | 0.5832 | -1.3130 | 0.8000 | 1.8962 | -28.0704 | -47.0858 | -1.2846 | -1.3927 |
0.1975 | 8.0178 | 450 | 0.5028 | 0.5658 | -1.3484 | 0.8100 | 1.9142 | -28.2443 | -47.4404 | -1.2803 | -1.3872 |
0.2123 | 8.9087 | 500 | 0.5044 | 0.5590 | -1.3822 | 0.8100 | 1.9412 | -28.3127 | -47.7785 | -1.2744 | -1.3800 |
0.2259 | 9.7996 | 550 | 0.5094 | 0.5423 | -1.3976 | 0.8000 | 1.9400 | -28.4790 | -47.9327 | -1.2714 | -1.3782 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for chchen/Ministral-8B-Instruct-2410-dpo-mistral-1000
Base model
mistralai/Ministral-8B-Instruct-2410