collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9397
Num Input Tokens Seen: 17213660

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.1282	0
2.8683	0.0141	5	1.0844	236836
2.4917	0.0282	10	1.0063	477060
2.4805	0.0423	15	0.9925	726020
2.4109	0.0564	20	0.9841	973680
2.296	0.0705	25	0.9858	1223540
2.0121	0.0846	30	0.9919	1470300
1.7927	0.0987	35	0.9894	1718324
1.8281	0.1128	40	0.9977	1962952
1.7373	0.1268	45	0.9951	2208104
1.5941	0.1409	50	0.9881	2455412
1.6058	0.1550	55	0.9857	2696832
1.0647	0.1691	60	0.9818	2941868
1.1676	0.1832	65	0.9758	3188060
1.2806	0.1973	70	0.9758	3427696
1.0585	0.2114	75	0.9734	3672608
1.0442	0.2255	80	0.9696	3910204
1.0145	0.2396	85	0.9699	4146872
1.0364	0.2537	90	0.9652	4394008
1.0252	0.2678	95	0.9647	4635300
0.969	0.2819	100	0.9630	4879116
0.7795	0.2960	105	0.9612	5118936
0.8606	0.3101	110	0.9571	5366792
1.0389	0.3242	115	0.9581	5612876
0.8369	0.3383	120	0.9558	5861964
0.8261	0.3524	125	0.9563	6109352
0.7797	0.3665	130	0.9521	6350016
0.91	0.3805	135	0.9539	6594400
0.9656	0.3946	140	0.9528	6829540
0.8705	0.4087	145	0.9517	7073132
0.9275	0.4228	150	0.9501	7317792
0.7878	0.4369	155	0.9495	7562692
0.79	0.4510	160	0.9493	7804712
0.9756	0.4651	165	0.9486	8045908
0.831	0.4792	170	0.9501	8295248
0.7312	0.4933	175	0.9482	8539448
0.8828	0.5074	180	0.9462	8782312
0.654	0.5215	185	0.9476	9028520
0.9007	0.5356	190	0.9451	9272816
0.7856	0.5497	195	0.9463	9519724
0.6986	0.5638	200	0.9445	9769440
0.8185	0.5779	205	0.9482	10012624
0.7951	0.5920	210	0.9453	10257436
0.7885	0.6061	215	0.9442	10497084
0.8135	0.6202	220	0.9452	10726612
0.8553	0.6342	225	0.9432	10964756
0.7149	0.6483	230	0.9454	11206028
0.796	0.6624	235	0.9439	11446772
0.7876	0.6765	240	0.9443	11686044
0.7328	0.6906	245	0.9433	11936452
0.8117	0.7047	250	0.9431	12174492
0.9161	0.7188	255	0.9400	12410412
0.6793	0.7329	260	0.9424	12649736
0.7372	0.7470	265	0.9430	12887028
0.6329	0.7611	270	0.9402	13126712
0.8913	0.7752	275	0.9416	13368188
0.83	0.7893	280	0.9409	13615264
0.6657	0.8034	285	0.9400	13855436
0.9027	0.8175	290	0.9404	14102064
0.7206	0.8316	295	0.9401	14340172
0.7678	0.8457	300	0.9399	14573172
0.8187	0.8598	305	0.9401	14816224
0.6861	0.8739	310	0.9399	15065152
0.8274	0.8879	315	0.9384	15306488
0.8374	0.9020	320	0.9391	15543972
0.7515	0.9161	325	0.9370	15780660
0.8439	0.9302	330	0.9393	16027512
0.7666	0.9443	335	0.9410	16271828
0.7781	0.9584	340	0.9404	16516708
0.77	0.9725	345	0.9435	16772604
0.6227	0.9866	350	0.9362	17015180

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd2

collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter4_sftsd2

Evaluation results