metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2
    results: []

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1032
Num Input Tokens Seen: 21819352

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.5432	0.0127	5	1.3798	276728
1.4208	0.0254	10	1.2917	554256
1.4236	0.0381	15	1.2111	833616
1.3033	0.0508	20	1.1647	1109744
1.2167	0.0634	25	1.1518	1384696
1.0953	0.0761	30	1.1341	1664008
0.9168	0.0888	35	1.1461	1944176
0.9273	0.1015	40	1.1542	2218368
0.8943	0.1142	45	1.1696	2492552
0.8168	0.1269	50	1.1792	2773488
0.7781	0.1396	55	1.1739	3050208
0.8131	0.1523	60	1.1845	3326584
0.6973	0.1649	65	1.1836	3606104
0.7054	0.1776	70	1.1733	3887952
0.685	0.1903	75	1.1764	4170752
0.5768	0.2030	80	1.1771	4444816
0.6494	0.2157	85	1.1719	4718552
0.5484	0.2284	90	1.1698	4998784
0.5609	0.2411	95	1.1739	5274536
0.4343	0.2538	100	1.1755	5553760
0.5656	0.2665	105	1.1654	5828328
0.5633	0.2791	110	1.1696	6104712
0.4485	0.2918	115	1.1631	6380840
0.4853	0.3045	120	1.1658	6651752
0.4552	0.3172	125	1.1593	6928872
0.4465	0.3299	130	1.1584	7200200
0.4402	0.3426	135	1.1605	7481976
0.4228	0.3553	140	1.1536	7765000
0.5075	0.3680	145	1.1529	8037040
0.3783	0.3807	150	1.1505	8313288
0.4	0.3933	155	1.1464	8593584
0.4482	0.4060	160	1.1507	8869384
0.4995	0.4187	165	1.1418	9145296
0.4386	0.4314	170	1.1420	9423816
0.3944	0.4441	175	1.1406	9707024
0.5069	0.4568	180	1.1408	9977424
0.36	0.4695	185	1.1408	10247568
0.4558	0.4822	190	1.1369	10525312
0.4699	0.4948	195	1.1341	10807080
0.5118	0.5075	200	1.1346	11075200
0.5246	0.5202	205	1.1310	11355128
0.5085	0.5329	210	1.1323	11635976
0.3497	0.5456	215	1.1290	11912608
0.4282	0.5583	220	1.1304	12191360
0.3405	0.5710	225	1.1261	12468896
0.4814	0.5837	230	1.1271	12748408
0.3857	0.5964	235	1.1262	13023016
0.4579	0.6090	240	1.1245	13302328
0.4054	0.6217	245	1.1244	13575408
0.4019	0.6344	250	1.1222	13851880
0.4085	0.6471	255	1.1206	14126456
0.3261	0.6598	260	1.1226	14411880
0.3434	0.6725	265	1.1197	14693704
0.3898	0.6852	270	1.1189	14972552
0.3275	0.6979	275	1.1202	15244856
0.3851	0.7105	280	1.1181	15517984
0.3896	0.7232	285	1.1167	15793480
0.4382	0.7359	290	1.1164	16072136
0.4112	0.7486	295	1.1147	16347632
0.4165	0.7613	300	1.1153	16622200
0.3549	0.7740	305	1.1137	16896656
0.3859	0.7867	310	1.1130	17175712
0.3636	0.7994	315	1.1129	17456320
0.4647	0.8121	320	1.1109	17735952
0.3973	0.8247	325	1.1121	18011048
0.3857	0.8374	330	1.1100	18285984
0.3692	0.8501	335	1.1105	18560024
0.4178	0.8628	340	1.1092	18834584
0.3232	0.8755	345	1.1070	19113832
0.3482	0.8882	350	1.1070	19390200
0.4256	0.9009	355	1.1065	19670664
0.4421	0.9136	360	1.1040	19946664
0.4513	0.9262	365	1.1046	20229584
0.395	0.9389	370	1.1059	20503736
0.3129	0.9516	375	1.1033	20776680
0.3915	0.9643	380	1.1048	21053616
0.3239	0.9770	385	1.1003	21327312
0.3765	0.9897	390	1.1039	21601936

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1