checkpoints-mistral-300M
This model is a fine-tuned version of None on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.4867
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 6
- eval_batch_size: 6
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 192
- total_eval_batch_size: 12
- optimizer: Adam with betas=(0.9,0.95) and epsilon=0.0001
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 4
- num_epochs: 6
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
4.5141 | 0.09 | 1000 | 4.5160 |
3.7879 | 0.18 | 2000 | 3.8531 |
3.5484 | 0.27 | 3000 | 3.5881 |
3.3734 | 0.36 | 4000 | 3.4287 |
3.2722 | 0.45 | 5000 | 3.3144 |
3.2276 | 0.54 | 6000 | 3.2299 |
3.1809 | 0.63 | 7000 | 3.1597 |
3.0706 | 0.72 | 8000 | 3.1043 |
3.0185 | 0.81 | 9000 | 3.0578 |
2.9496 | 0.9 | 10000 | 3.0157 |
2.9374 | 0.99 | 11000 | 2.9815 |
2.8794 | 1.08 | 12000 | 2.9487 |
2.8407 | 1.17 | 13000 | 2.9229 |
2.8818 | 1.26 | 14000 | 2.8973 |
2.8167 | 1.35 | 15000 | 2.8730 |
2.7941 | 1.44 | 16000 | 2.8515 |
2.7878 | 1.53 | 17000 | 2.8311 |
2.7894 | 1.62 | 18000 | 2.8113 |
2.7158 | 1.71 | 19000 | 2.7935 |
2.7409 | 1.8 | 20000 | 2.7765 |
2.7349 | 1.89 | 21000 | 2.7613 |
2.6631 | 1.98 | 22000 | 2.7451 |
2.6766 | 2.07 | 23000 | 2.7353 |
2.6405 | 2.16 | 24000 | 2.7231 |
2.6707 | 2.25 | 25000 | 2.7121 |
2.6362 | 2.34 | 26000 | 2.7005 |
2.5997 | 2.43 | 27000 | 2.6904 |
2.6549 | 2.52 | 28000 | 2.6798 |
2.6056 | 2.61 | 29000 | 2.6688 |
2.5722 | 2.7 | 30000 | 2.6594 |
2.6179 | 2.79 | 31000 | 2.6509 |
2.6064 | 2.88 | 32000 | 2.6423 |
2.5836 | 2.97 | 33000 | 2.6340 |
2.5502 | 3.06 | 34000 | 2.6285 |
2.5428 | 3.15 | 35000 | 2.6218 |
2.5342 | 3.24 | 36000 | 2.6160 |
2.5152 | 3.33 | 37000 | 2.6090 |
2.5138 | 5.13 | 38000 | 2.5766 |
2.5032 | 5.27 | 39000 | 2.5683 |
2.4783 | 5.4 | 40000 | 2.5609 |
2.4519 | 5.54 | 41000 | 2.5545 |
2.4918 | 5.67 | 42000 | 2.5472 |
2.4591 | 5.81 | 43000 | 2.5411 |
2.4756 | 5.94 | 44000 | 2.5354 |
2.4434 | 6.08 | 45000 | 2.5345 |
2.4312 | 6.21 | 46000 | 2.5301 |
2.4576 | 6.35 | 47000 | 2.5242 |
2.4343 | 6.48 | 48000 | 2.5192 |
2.426 | 6.62 | 49000 | 2.5139 |
2.4136 | 6.75 | 50000 | 2.5084 |
2.4463 | 6.89 | 51000 | 2.5037 |
2.345 | 7.02 | 52000 | 2.5016 |
2.3736 | 7.16 | 53000 | 2.4990 |
2.4092 | 7.29 | 54000 | 2.4955 |
2.3689 | 7.43 | 55000 | 2.4917 |
2.3797 | 7.56 | 56000 | 2.4867 |
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.5
- Tokenizers 0.14.1
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support