train_2025-04-13-11-42-17

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-3B on the hccsri dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
num_epochs: 1.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
3.3828	0.0421	100	3.3995	12336
3.7301	0.0842	200	3.3119	25168
2.7913	0.1263	300	3.2942	38256
2.9739	0.1684	400	3.2371	51344
2.4265	0.2105	500	3.2291	64896
3.2421	0.2526	600	3.1979	77584
3.1406	0.2947	700	3.1769	89952
3.1426	0.3368	800	3.1570	102976
3.101	0.3789	900	3.1375	115536
3.3436	0.4211	1000	3.1259	129104
3.2059	0.4632	1100	3.0889	142176
3.2607	0.5053	1200	3.0811	155360
2.8431	0.5474	1300	3.0604	168080
3.3622	0.5895	1400	3.0450	181424
2.2921	0.6316	1500	3.0402	193696
3.1937	0.6737	1600	3.0319	206800
3.2635	0.7158	1700	3.0285	219920
2.9374	0.7579	1800	3.0246	232224
3.3592	0.8	1900	3.0196	244976
3.1163	0.8421	2000	3.0173	256912
2.8533	0.8842	2100	3.0168	270112
3.6021	0.9263	2200	3.0151	282512
3.2839	0.9684	2300	3.0146	295680