collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd0

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9428
Num Input Tokens Seen: 17441212

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.1282	0
2.7026	0.0142	5	1.0845	250572
2.7951	0.0285	10	1.0059	495468
2.4024	0.0427	15	0.9903	740552
2.3813	0.0570	20	0.9834	983128
2.0909	0.0712	25	0.9822	1231348
2.0832	0.0855	30	0.9854	1485220
2.2061	0.0997	35	0.9889	1739976
1.8943	0.1140	40	0.9868	1994708
1.8237	0.1282	45	0.9788	2248268
1.6061	0.1425	50	0.9833	2491904
1.645	0.1567	55	0.9800	2745000
1.5498	0.1710	60	0.9792	2988148
1.2707	0.1852	65	0.9792	3237400
1.2508	0.1995	70	0.9746	3494232
1.2433	0.2137	75	0.9708	3747944
1.1545	0.2280	80	0.9691	3990240
1.3564	0.2422	85	0.9691	4234336
1.1692	0.2565	90	0.9681	4481372
1.1797	0.2707	95	0.9646	4733204
1.1292	0.2850	100	0.9630	4979876
1.034	0.2992	105	0.9641	5219284
1.0656	0.3135	110	0.9605	5467328
1.0678	0.3277	115	0.9588	5723652
1.0246	0.3420	120	0.9581	5975880
1.1025	0.3562	125	0.9580	6219980
1.0895	0.3705	130	0.9559	6475528
0.9828	0.3847	135	0.9546	6724216
0.9003	0.3990	140	0.9516	6971248
0.9099	0.4132	145	0.9538	7219644
0.9169	0.4275	150	0.9503	7471332
0.9124	0.4417	155	0.9517	7725516
0.9038	0.4560	160	0.9509	7974732
0.9577	0.4702	165	0.9490	8222880
1.0668	0.4845	170	0.9486	8463156
1.0556	0.4987	175	0.9484	8711816
0.958	0.5130	180	0.9446	8964120
0.7769	0.5272	185	0.9472	9212680
0.7975	0.5415	190	0.9450	9459576
0.8965	0.5557	195	0.9442	9711232
0.9835	0.5700	200	0.9461	9962788
0.9513	0.5842	205	0.9421	10215452
0.9281	0.5985	210	0.9448	10468768
0.819	0.6127	215	0.9426	10711836
0.8368	0.6269	220	0.9454	10963464
0.8332	0.6412	225	0.9419	11211872
1.1059	0.6554	230	0.9416	11468040
0.7919	0.6697	235	0.9409	11711864
0.7565	0.6839	240	0.9414	11960556
0.6964	0.6982	245	0.9424	12207416
0.92	0.7124	250	0.9419	12449244
0.7462	0.7267	255	0.9402	12696604
1.0246	0.7409	260	0.9435	12946160
0.7697	0.7552	265	0.9396	13199664
0.6771	0.7694	270	0.9407	13444784
0.7791	0.7837	275	0.9394	13700124
0.9775	0.7979	280	0.9422	13953992
0.9798	0.8122	285	0.9381	14204856
0.8106	0.8264	290	0.9395	14451212
0.8597	0.8407	295	0.9400	14702444
0.9122	0.8549	300	0.9450	14954496
0.8738	0.8692	305	0.9410	15199504
0.8448	0.8834	310	0.9375	15449288
0.7054	0.8977	315	0.9385	15692504
0.9606	0.9119	320	0.9380	15942552
1.0059	0.9262	325	0.9357	16189660
0.703	0.9404	330	0.9405	16441124
0.9094	0.9547	335	0.9358	16688128
0.8983	0.9689	340	0.9388	16938972
0.86	0.9832	345	0.9368	17187008
0.8023	0.9974	350	0.9428	17441212

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd0

collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd0

Evaluation results