Baby-Llama-58M-RUN3_3

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.8148

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.00025
train_batch_size: 32
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
num_epochs: 120
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
297.4542	1.0	12	250.9910
229.6338	2.0	24	208.3821
208.295	3.0	36	179.5238
129.018	4.0	48	112.9940
82.9929	5.0	60	74.3020
46.9522	6.0	72	42.2297
24.9202	7.0	84	23.4095
15.2942	8.0	96	13.3510
10.0619	9.0	108	9.7284
7.784	10.0	120	7.8737
6.4759	11.0	132	7.2488
6.1744	12.0	144	6.3695
5.4904	13.0	156	6.2293
5.4665	14.0	168	5.8846
4.731	15.0	180	5.8094
4.7619	16.0	192	5.4680
4.6858	17.0	204	5.4562
4.594	18.0	216	5.2367
4.7173	19.0	228	5.1584
4.2267	20.0	240	5.1182
4.2401	21.0	252	5.0173
4.767	22.0	264	4.9806
4.0932	23.0	276	4.8975
4.3266	24.0	288	4.8852
4.0103	25.0	300	4.7698
4.1829	26.0	312	4.7993
4.0862	27.0	324	4.7921
4.1418	28.0	336	4.7469
4.0668	29.0	348	4.7108
4.0318	30.0	360	4.6335
4.0468	31.0	372	4.6761
3.9454	32.0	384	4.5814
3.943	33.0	396	4.5624
3.5406	34.0	408	4.6243
3.5091	35.0	420	4.5822
3.5972	36.0	432	4.4551
3.711	37.0	444	4.4898
3.7391	38.0	456	4.4472
3.7883	39.0	468	4.4188
3.7508	40.0	480	4.3803
3.422	41.0	492	4.3539
3.5801	42.0	504	4.3718
3.3411	43.0	516	4.3635
3.5347	44.0	528	4.3381
3.3136	45.0	540	4.2857
3.6378	46.0	552	4.2428
3.9194	47.0	564	4.3143
3.444	48.0	576	4.2403
3.5414	49.0	588	4.2614
3.6703	50.0	600	4.2729
3.5997	51.0	612	4.2104
3.1202	52.0	624	4.1948
3.3409	53.0	636	4.2018
3.4611	54.0	648	4.1726
3.1643	55.0	660	4.1776
3.1082	56.0	672	4.1785
2.9745	57.0	684	4.1374
3.3937	58.0	696	4.1434
3.265	59.0	708	4.1356
3.0267	60.0	720	4.1474
3.0632	61.0	732	4.1193
3.3543	62.0	744	4.0760
3.519	63.0	756	4.1373
3.2546	64.0	768	4.0591
3.0835	65.0	780	4.0572
3.3228	66.0	792	4.0788
3.3441	67.0	804	4.0489
2.9186	68.0	816	4.0360
3.1519	69.0	828	4.0376
3.5119	70.0	840	4.0159
3.1155	71.0	852	4.0070
3.1899	72.0	864	3.9895
3.0979	73.0	876	3.9936
3.1709	74.0	888	3.9997
3.3529	75.0	900	3.9848
2.7989	76.0	912	3.9760
3.1918	77.0	924	3.9693
2.8472	78.0	936	3.9504
3.3493	79.0	948	3.9520
3.5098	80.0	960	3.9401
3.2381	81.0	972	3.9363
3.1959	82.0	984	3.9292
3.4514	83.0	996	3.9128
2.9119	84.0	1008	3.9194
3.2452	85.0	1020	3.9038
3.0657	86.0	1032	3.9168
2.8583	87.0	1044	3.9018
3.2229	88.0	1056	3.9000
2.9973	89.0	1068	3.8906
3.0533	90.0	1080	3.8818
3.3813	91.0	1092	3.8715
3.1559	92.0	1104	3.8639
3.1343	93.0	1116	3.8674
2.9604	94.0	1128	3.8690
3.3522	95.0	1140	3.8646
2.9739	96.0	1152	3.8589
2.7854	97.0	1164	3.8559
2.8544	98.0	1176	3.8445
2.9875	99.0	1188	3.8434
3.3395	100.0	1200	3.8402
2.736	101.0	1212	3.8398
3.0598	102.0	1224	3.8384
3.003	103.0	1236	3.8376
3.0566	104.0	1248	3.8386
3.1727	105.0	1260	3.8281
2.9811	106.0	1272	3.8331
2.7108	107.0	1284	3.8224
2.6579	108.0	1296	3.8236
3.1319	109.0	1308	3.8197
3.1115	110.0	1320	3.8216
3.0955	111.0	1332	3.8181
2.6928	112.0	1344	3.8188
2.9943	113.0	1356	3.8147
3.0923	114.0	1368	3.8154
3.1913	115.0	1380	3.8156
2.9444	116.0	1392	3.8146
3.0491	117.0	1404	3.8141
2.7357	118.0	1416	3.8148
3.0744	119.0	1428	3.8148
3.1122	120.0	1440	3.8148

Framework versions

Transformers 4.39.1
Pytorch 2.1.2+cu121
Datasets 2.16.1
Tokenizers 0.15.0

ninagroot
/

Baby-Llama-58M-RUN3_3

Baby-Llama-58M-RUN3_3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results