Llama-31-8B_task-1_60-samples_config-4

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the GaetanMichelet/chat-60_ft_task-1 dataset. It achieves the following results on the evaluation set:

Loss: 1.2912

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 150

Training results

Training Loss	Epoch	Step	Validation Loss
2.1668	0.6957	2	2.0787
2.1494	1.7391	5	2.0737
2.127	2.7826	8	2.0633
2.023	3.8261	11	2.0470
2.0881	4.8696	14	2.0213
2.0378	5.9130	17	1.9882
2.0073	6.9565	20	1.9451
1.9467	8.0	23	1.8881
1.9133	8.6957	25	1.8427
1.8018	9.7391	28	1.7652
1.691	10.7826	31	1.6856
1.6135	11.8261	34	1.6190
1.5471	12.8696	37	1.5723
1.5155	13.9130	40	1.5440
1.4371	14.9565	43	1.5235
1.4825	16.0	46	1.5019
1.4532	16.6957	48	1.4894
1.4277	17.7391	51	1.4693
1.366	18.7826	54	1.4504
1.4417	19.8261	57	1.4330
1.3645	20.8696	60	1.4170
1.3153	21.9130	63	1.4029
1.3036	22.9565	66	1.3847
1.2775	24.0	69	1.3692
1.2726	24.6957	71	1.3621
1.2949	25.7391	74	1.3510
1.1424	26.7826	77	1.3406
1.2489	27.8261	80	1.3327
1.1662	28.8696	83	1.3225
1.1614	29.9130	86	1.3144
1.146	30.9565	89	1.3094
1.1177	32.0	92	1.3025
1.0748	32.6957	94	1.2985
1.118	33.7391	97	1.2957
1.0599	34.7826	100	1.2924
1.0607	35.8261	103	1.2912
1.0041	36.8696	106	1.2955
1.0132	37.9130	109	1.2980
1.0062	38.9565	112	1.3068
0.9466	40.0	115	1.3118
0.9728	40.6957	117	1.3147
0.882	41.7391	120	1.3195
0.9193	42.7826	123	1.3276

Framework versions

PEFT 0.12.0
Transformers 4.44.0
Pytorch 2.1.2+cu121
Datasets 2.20.0
Tokenizers 0.19.1

GaetanMichelet
/

Llama-31-8B_task-1_60-samples_config-4

Llama-31-8B_task-1_60-samples_config-4

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for GaetanMichelet/Llama-31-8B_task-1_60-samples_config-4

Collection including GaetanMichelet/Llama-31-8B_task-1_60-samples_config-4

Configurations choice

Evaluation results