IE_M2_1000steps_1e7rate_05beta_cSFTDPO
This model is a fine-tuned version of tsavage68/IE_M2_1000steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.3743
- Rewards/chosen: -0.4914
- Rewards/rejected: -7.5827
- Rewards/accuracies: 0.4600
- Rewards/margins: 7.0912
- Logps/rejected: -56.1871
- Logps/chosen: -43.1884
- Logits/rejected: -2.8937
- Logits/chosen: -2.8314
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.4538 | 0.4 | 50 | 0.3745 | -0.1879 | -3.8972 | 0.4600 | 3.7093 | -48.8163 | -42.5814 | -2.9040 | -2.8425 |
0.3812 | 0.8 | 100 | 0.3743 | -0.3333 | -6.0415 | 0.4600 | 5.7082 | -53.1048 | -42.8721 | -2.8978 | -2.8359 |
0.3119 | 1.2 | 150 | 0.3743 | -0.4581 | -7.1508 | 0.4600 | 6.6927 | -55.3234 | -43.1217 | -2.8957 | -2.8335 |
0.3639 | 1.6 | 200 | 0.3743 | -0.4596 | -7.1592 | 0.4600 | 6.6996 | -55.3402 | -43.1247 | -2.8956 | -2.8335 |
0.4332 | 2.0 | 250 | 0.3743 | -0.4698 | -7.3147 | 0.4600 | 6.8449 | -55.6512 | -43.1451 | -2.8944 | -2.8322 |
0.3986 | 2.4 | 300 | 0.3743 | -0.4645 | -7.3358 | 0.4600 | 6.8713 | -55.6934 | -43.1344 | -2.8945 | -2.8322 |
0.3986 | 2.8 | 350 | 0.3743 | -0.4758 | -7.3590 | 0.4600 | 6.8832 | -55.7398 | -43.1571 | -2.8947 | -2.8325 |
0.4505 | 3.2 | 400 | 0.3743 | -0.4808 | -7.3913 | 0.4600 | 6.9105 | -55.8044 | -43.1671 | -2.8944 | -2.8321 |
0.4505 | 3.6 | 450 | 0.3743 | -0.4859 | -7.4793 | 0.4600 | 6.9934 | -55.9805 | -43.1774 | -2.8942 | -2.8319 |
0.4332 | 4.0 | 500 | 0.3743 | -0.4895 | -7.5333 | 0.4600 | 7.0438 | -56.0884 | -43.1845 | -2.8937 | -2.8314 |
0.3292 | 4.4 | 550 | 0.3743 | -0.4880 | -7.5663 | 0.4600 | 7.0782 | -56.1543 | -43.1815 | -2.8938 | -2.8316 |
0.3639 | 4.8 | 600 | 0.3743 | -0.4870 | -7.5730 | 0.4600 | 7.0860 | -56.1677 | -43.1795 | -2.8936 | -2.8313 |
0.4505 | 5.2 | 650 | 0.3743 | -0.4897 | -7.5693 | 0.4600 | 7.0796 | -56.1604 | -43.1849 | -2.8935 | -2.8312 |
0.4505 | 5.6 | 700 | 0.3743 | -0.4895 | -7.5788 | 0.4600 | 7.0893 | -56.1795 | -43.1845 | -2.8938 | -2.8316 |
0.3639 | 6.0 | 750 | 0.3743 | -0.4877 | -7.5842 | 0.4600 | 7.0965 | -56.1901 | -43.1808 | -2.8935 | -2.8312 |
0.2426 | 6.4 | 800 | 0.3743 | -0.4987 | -7.5876 | 0.4600 | 7.0889 | -56.1971 | -43.2030 | -2.8938 | -2.8316 |
0.5025 | 6.8 | 850 | 0.3743 | -0.4942 | -7.5824 | 0.4600 | 7.0882 | -56.1866 | -43.1939 | -2.8937 | -2.8314 |
0.3119 | 7.2 | 900 | 0.3743 | -0.4890 | -7.5862 | 0.4600 | 7.0972 | -56.1942 | -43.1835 | -2.8936 | -2.8314 |
0.3466 | 7.6 | 950 | 0.3743 | -0.4914 | -7.5827 | 0.4600 | 7.0912 | -56.1871 | -43.1884 | -2.8937 | -2.8314 |
0.3812 | 8.0 | 1000 | 0.3743 | -0.4914 | -7.5827 | 0.4600 | 7.0912 | -56.1871 | -43.1884 | -2.8937 | -2.8314 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.0.0+cu117
- Datasets 3.0.0
- Tokenizers 0.19.1
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for tsavage68/IE_M2_1000steps_1e7rate_05beta_cSFTDPO
Base model
mistralai/Mistral-7B-Instruct-v0.2
Finetuned
tsavage68/IE_M2_1000steps_1e7rate_SFT