collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9531
Num Input Tokens Seen: 19303096

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.4118	0.0133	5	1.1826	257860
1.2214	0.0266	10	1.0631	522104
1.0564	0.0400	15	1.0225	780584
0.7497	0.0533	20	1.0114	1036328
0.6525	0.0666	25	1.0250	1295960
0.4833	0.0799	30	1.0213	1552720
0.4335	0.0933	35	1.0174	1808712
0.3836	0.1066	40	1.0126	2061276
0.4176	0.1199	45	1.0100	2316160
0.3447	0.1332	50	1.0023	2572684
0.3402	0.1466	55	0.9984	2828056
0.3671	0.1599	60	0.9914	3084180
0.3605	0.1732	65	0.9913	3341524
0.3938	0.1865	70	0.9866	3608220
0.3298	0.1999	75	0.9840	3864084
0.3437	0.2132	80	0.9800	4125920
0.4241	0.2265	85	0.9796	4376200
0.3798	0.2398	90	0.9779	4636060
0.3598	0.2531	95	0.9747	4894472
0.401	0.2665	100	0.9730	5157976
0.3151	0.2798	105	0.9742	5414044
0.3781	0.2931	110	0.9713	5673020
0.4242	0.3064	115	0.9694	5930676
0.3515	0.3198	120	0.9692	6195360
0.2744	0.3331	125	0.9673	6452160
0.3215	0.3464	130	0.9655	6702024
0.3921	0.3597	135	0.9647	6952796
0.3987	0.3731	140	0.9633	7216020
0.3074	0.3864	145	0.9640	7474692
0.3314	0.3997	150	0.9631	7739548
0.3048	0.4130	155	0.9610	8005920
0.3229	0.4263	160	0.9626	8259444
0.2944	0.4397	165	0.9617	8514840
0.2932	0.4530	170	0.9619	8772880
0.2929	0.4663	175	0.9613	9032612
0.3491	0.4796	180	0.9602	9285936
0.3658	0.4930	185	0.9611	9541684
0.2627	0.5063	190	0.9609	9796096
0.3652	0.5196	195	0.9597	10054920
0.2474	0.5329	200	0.9593	10310456
0.3399	0.5463	205	0.9610	10566224
0.293	0.5596	210	0.9584	10821340
0.332	0.5729	215	0.9575	11080028
0.3365	0.5862	220	0.9576	11339624
0.3079	0.5996	225	0.9569	11596368
0.3383	0.6129	230	0.9568	11846020
0.3074	0.6262	235	0.9568	12097444
0.2863	0.6395	240	0.9555	12360820
0.3494	0.6528	245	0.9550	12619744
0.3301	0.6662	250	0.9564	12879604
0.2942	0.6795	255	0.9556	13133500
0.2745	0.6928	260	0.9545	13387256
0.2444	0.7061	265	0.9553	13653368
0.2921	0.7195	270	0.9563	13909732
0.256	0.7328	275	0.9558	14169440
0.3005	0.7461	280	0.9538	14430448
0.2816	0.7594	285	0.9529	14687952
0.3103	0.7728	290	0.9544	14938896
0.2936	0.7861	295	0.9562	15191276
0.3045	0.7994	300	0.9557	15445016
0.3128	0.8127	305	0.9543	15698372
0.247	0.8260	310	0.9546	15955200
0.3089	0.8394	315	0.9561	16215480
0.2694	0.8527	320	0.9561	16473036
0.2898	0.8660	325	0.9539	16732368
0.324	0.8793	330	0.9547	16988880
0.3019	0.8927	335	0.9555	17249604
0.3881	0.9060	340	0.9563	17504072
0.1919	0.9193	345	0.9545	17763736
0.2813	0.9326	350	0.9524	18029128
0.3241	0.9460	355	0.9538	18283752
0.2958	0.9593	360	0.9570	18535884
0.3128	0.9726	365	0.9527	18795248
0.287	0.9859	370	0.9508	19049296
0.2516	0.9993	375	0.9531	19303096

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd2

collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter4_sftsd2

Evaluation results