fine_tuned_per_domain_balanced_moe_c10

This model is a fine-tuned version of Qwen/Qwen1.5-MoE-A2.7B on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3

Training Loss	Epoch	Step	Accuracy	Validation Loss
7.9537	0.0006	100	0.5384	4.2406
2.7142	0.0013	200	0.5386	6.1312
2.5969	0.0019	300	0.4651	1.0811
3.6087	0.0025	400	0.4655	1.7135
3.217	0.0032	500	0.5386	2.4567
2.0844	0.0038	600	0.4614	3.8137
3.0955	0.0044	700	0.5386	1.2668
2.0157	0.0051	800	0.5386	3.2796
2.4513	0.0057	900	0.4614	2.2765
2.482	0.0063	1000	0.5386	0.7492
2.3079	0.0070	1100	0.5386	1.6933
2.5698	0.0076	1200	0.5386	3.1721
2.4214	0.0082	1300	0.5386	1.7702
1.2708	0.0089	1400	0.4646	0.9111
0.8665	0.0095	1500	0.5494	0.6819
1.7844	0.0101	1600	0.5386	1.7757
2.9675	0.0108	1700	0.5386	2.7387
2.7119	0.0114	1800	0.5386	2.6287
2.526	0.0120	1900	0.5386	1.4967
3.2745	0.0127	2000	0.4614	4.2874
3.4052	0.0133	2100	1.0082	0.4624
1.7179	0.0139	2200	1.6046	0.4666
2.7225	0.0146	2300	3.3510	0.5376
2.2919	0.0152	2400	3.3149	0.5376
1.729	0.0158	2500	2.1687	0.5376
2.5072	0.0165	2600	2.9068	0.5376
1.9138	0.0171	2700	1.4200	0.4624
1.4881	0.0177	2800	2.2129	0.4631
2.031	0.0184	2900	2.2580	0.5370
1.998	0.0190	3000	2.2149	0.5374