collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd1

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9369
Num Input Tokens Seen: 17420796

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.1282	0
2.6692	0.0144	5	1.0838	251232
2.6572	0.0288	10	1.0019	504724
2.4487	0.0432	15	0.9898	759272
2.3685	0.0576	20	0.9814	1007344
2.2607	0.0721	25	0.9869	1261444
2.1841	0.0865	30	0.9878	1511004
1.9613	0.1009	35	0.9908	1763396
1.9138	0.1153	40	0.9865	2017584
1.7242	0.1297	45	0.9835	2271904
1.56	0.1441	50	0.9825	2528656
1.5102	0.1585	55	0.9806	2772068
1.4168	0.1729	60	0.9775	3023420
1.4362	0.1874	65	0.9754	3276920
1.3918	0.2018	70	0.9761	3531492
1.5127	0.2162	75	0.9706	3784992
1.3944	0.2306	80	0.9733	4032436
1.1925	0.2450	85	0.9723	4273560
1.183	0.2594	90	0.9640	4520508
1.2304	0.2738	95	0.9646	4770368
1.0872	0.2882	100	0.9648	5020016
1.1574	0.3026	105	0.9607	5276716
1.1035	0.3171	110	0.9611	5521372
1.0914	0.3315	115	0.9585	5776324
0.9998	0.3459	120	0.9598	6022272
0.9534	0.3603	125	0.9555	6260392
1.0917	0.3747	130	0.9535	6521380
1.1094	0.3891	135	0.9535	6769228
1.1871	0.4035	140	0.9526	7024704
0.9796	0.4179	145	0.9514	7273240
1.0659	0.4324	150	0.9495	7525180
1.1488	0.4468	155	0.9484	7775292
0.9887	0.4612	160	0.9497	8016808
1.1045	0.4756	165	0.9451	8266100
1.0371	0.4900	170	0.9465	8514128
1.0966	0.5044	175	0.9450	8763440
1.0408	0.5188	180	0.9460	9017676
1.0891	0.5332	185	0.9435	9265972
1.0561	0.5476	190	0.9450	9522024
0.9537	0.5621	195	0.9434	9764580
0.9373	0.5765	200	0.9431	10016796
1.1323	0.5909	205	0.9423	10269756
1.2019	0.6053	210	0.9438	10520656
0.9699	0.6197	215	0.9416	10771848
0.9654	0.6341	220	0.9426	11022436
0.9461	0.6485	225	0.9405	11274272
0.9865	0.6629	230	0.9414	11531652
0.9315	0.6774	235	0.9391	11784148
0.9826	0.6918	240	0.9406	12037420
0.984	0.7062	245	0.9396	12295780
1.1796	0.7206	250	0.9419	12550852
1.0881	0.7350	255	0.9367	12796424
0.8628	0.7494	260	0.9386	13048276
1.094	0.7638	265	0.9372	13302068
1.0862	0.7782	270	0.9385	13552976
1.0226	0.7926	275	0.9375	13805560
0.9964	0.8071	280	0.9359	14063732
1.0379	0.8215	285	0.9368	14323416
0.7735	0.8359	290	0.9365	14578864
0.8855	0.8503	295	0.9354	14831324
0.9687	0.8647	300	0.9368	15079640
1.0087	0.8791	305	0.9351	15336076
0.8832	0.8935	310	0.9368	15598480
0.9207	0.9079	315	0.9353	15852360
0.9436	0.9224	320	0.9372	16105580
1.0136	0.9368	325	0.9360	16360756
0.9331	0.9512	330	0.9334	16610568
0.8251	0.9656	335	0.9353	16866280
0.8415	0.9800	340	0.9334	17114340
1.0314	0.9944	345	0.9360	17367496

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd1

collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd1

Evaluation results