Mistral-8B-Instruct-2410-009

Este modelo es un afnamiento del modelo mistralai/Ministral-8B-Instruct-2410 sobre un dataset del acuerdo 009 de la univalle con 8 mil millones de parámetros a 10 épocas, con un batch_size de 1 y ocupando por completo toda la memoria VRAM de la GPU de 24 Gz, logrando una función de pérdida de:

Loss: 0.3784

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
1.4242	0.2694	100	1.3903
1.2006	0.5387	200	1.2413
1.1104	0.8081	300	1.1213
0.9163	1.0754	400	1.0207
0.8715	1.3448	500	0.9261
0.7834	1.6141	600	0.8410
0.8636	1.8835	700	0.7739
0.7017	2.1508	800	0.7145
0.6127	2.4202	900	0.6664
0.6409	2.6896	1000	0.6310
0.584	2.9589	1100	0.5932
0.5592	3.2263	1200	0.5683
0.4336	3.4956	1300	0.5447
0.5287	3.7650	1400	0.5292
0.4449	4.0323	1500	0.5119
0.4894	4.3017	1600	0.4994
0.436	4.5710	1700	0.4799
0.3756	4.8404	1800	0.4676
0.3174	5.1077	1900	0.4545
0.3721	5.3771	2000	0.4475
0.3813	5.6465	2100	0.4367
0.3972	5.9158	2200	0.4281
0.354	6.1832	2300	0.4244
0.3299	6.4525	2400	0.4206
0.4017	6.7219	2500	0.4112
0.3103	6.9912	2600	0.4060
0.3299	7.2586	2700	0.4060
0.3874	7.5279	2800	0.3989
0.3838	7.7973	2900	0.3940
0.3446	8.0646	3000	0.3907
0.2674	8.3340	3100	0.3887
0.2681	8.6034	3200	0.3839
0.2922	8.8727	3300	0.3806
0.3125	9.1401	3400	0.3820
0.3042	9.4094	3500	0.3802
0.2623	9.6788	3600	0.3790
0.3382	9.9481	3700	0.3784

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.6.0+cu126
Datasets 3.5.0
Tokenizers 0.21.1

raulgdp
/

Mistral-8B-Instruct-2410-009

Mistral-8B-Instruct-2410-009

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for raulgdp/Mistral-8B-Instruct-2410-009

Dataset used to train raulgdp/Mistral-8B-Instruct-2410-009

Evaluation results