Mistral-Small-24B-Instruct-2501-009-3000

This model is a fine-tuned version of mistralai/Mistral-Small-24B-Instruct-2501 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 8
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
1.4124	0.2694	100	1.3776
1.209	0.5387	200	1.2167
1.0526	0.8081	300	1.0705
0.7969	1.0754	400	0.9167
0.7644	1.3448	500	0.8216
0.6831	1.6141	600	0.7254
0.7615	1.8835	700	0.6653
0.5999	2.1508	800	0.6172
0.556	2.4202	900	0.5793
0.5678	2.6896	1000	0.5534
0.5127	2.9589	1100	0.5219
0.4916	3.2263	1200	0.5020
0.3705	3.4956	1300	0.4867
0.4849	3.7650	1400	0.4751
0.4055	4.0323	1500	0.4621
0.428	4.3017	1600	0.4547
0.3895	4.5710	1700	0.4434
0.3481	4.8404	1800	0.4269
0.295	5.1077	1900	0.4222
0.3563	5.3771	2000	0.4167
0.3555	5.6465	2100	0.4090
0.371	5.9158	2200	0.4036
0.3317	6.1832	2300	0.4008
0.3107	6.4525	2400	0.3988
0.3817	6.7219	2500	0.3923
0.2904	6.9912	2600	0.3885
0.3191	7.2586	2700	0.3886
0.3573	7.5279	2800	0.3877
0.358	7.7973	2900	0.3848