collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.5681	0.0206	5	1.3561	281208
1.3762	0.0412	10	1.2366	561408
1.2447	0.0618	15	1.1711	846520
1.2027	0.0824	20	1.1444	1123552
1.3092	0.1030	25	1.1172	1406248
1.1897	0.1236	30	1.1186	1693528
1.0685	0.1441	35	1.1195	1979856
0.9925	0.1647	40	1.1286	2262504
1.0026	0.1853	45	1.1277	2544760
0.9181	0.2059	50	1.1374	2825680
0.9007	0.2265	55	1.1411	3103440
0.8626	0.2471	60	1.1421	3388104
0.8576	0.2677	65	1.1406	3668152
0.9025	0.2883	70	1.1459	3947528
0.8566	0.3089	75	1.1449	4229392
0.8071	0.3295	80	1.1467	4514912
0.7788	0.3501	85	1.1398	4800168
0.7999	0.3707	90	1.1427	5085472
0.7548	0.3912	95	1.1401	5370096
0.7775	0.4118	100	1.1324	5654448
0.6659	0.4324	105	1.1390	5932488
0.7151	0.4530	110	1.1345	6217432
0.7126	0.4736	115	1.1303	6504472
0.5812	0.4942	120	1.1395	6786136
0.7462	0.5148	125	1.1331	7075544
0.6824	0.5354	130	1.1306	7349632
0.7777	0.5560	135	1.1333	7638056
0.614	0.5766	140	1.1285	7926232
0.6151	0.5972	145	1.1264	8206848
0.7309	0.6178	150	1.1235	8494256
0.6219	0.6384	155	1.1226	8771192
0.6518	0.6589	160	1.1194	9060384
0.6101	0.6795	165	1.1167	9344632
0.6374	0.7001	170	1.1139	9625824
0.6431	0.7207	175	1.1153	9909464
0.6351	0.7413	180	1.1112	10193712
0.6205	0.7619	185	1.1099	10473824
0.5593	0.7825	190	1.1086	10757760
0.6611	0.8031	195	1.1067	11044304
0.604	0.8237	200	1.1089	11335648
0.5985	0.8443	205	1.1045	11616672
0.6425	0.8649	210	1.1041	11904256
0.6244	0.8855	215	1.1036	12186800
0.4801	0.9060	220	1.1015	12472520
0.5418	0.9266	225	1.1026	12757120
0.5693	0.9472	230	1.0992	13037120
0.6361	0.9678	235	1.0997	13321752
0.5677	0.9884	240	1.0984	13608048