qwen25-32b-ft-009

This model is a fine-tuned version of google/gemma-3-27b-it on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 10
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
24.5822	1.0754	100	1.5815
20.4926	2.1508	200	1.3543
17.5121	3.2263	300	1.2104
15.3986	4.3017	400	1.0824
13.8474	5.3771	500	0.9831
11.4366	6.4525	600	0.9011
11.0434	7.5279	700	0.8486
10.5674	8.6034	800	0.8092
10.0001	9.6788	900	0.7923