fine_tuned_per_domain_balanced_moe_c10

This model is a fine-tuned version of Qwen/Qwen1.5-MoE-A2.7B on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2149
  • Accuracy: 0.5374

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Accuracy Validation Loss
7.9537 0.0006 100 0.5384 4.2406
2.7142 0.0013 200 0.5386 6.1312
2.5969 0.0019 300 0.4651 1.0811
3.6087 0.0025 400 0.4655 1.7135
3.217 0.0032 500 0.5386 2.4567
2.0844 0.0038 600 0.4614 3.8137
3.0955 0.0044 700 0.5386 1.2668
2.0157 0.0051 800 0.5386 3.2796
2.4513 0.0057 900 0.4614 2.2765
2.482 0.0063 1000 0.5386 0.7492
2.3079 0.0070 1100 0.5386 1.6933
2.5698 0.0076 1200 0.5386 3.1721
2.4214 0.0082 1300 0.5386 1.7702
1.2708 0.0089 1400 0.4646 0.9111
0.8665 0.0095 1500 0.5494 0.6819
1.7844 0.0101 1600 0.5386 1.7757
2.9675 0.0108 1700 0.5386 2.7387
2.7119 0.0114 1800 0.5386 2.6287
2.526 0.0120 1900 0.5386 1.4967
3.2745 0.0127 2000 0.4614 4.2874
3.4052 0.0133 2100 1.0082 0.4624
1.7179 0.0139 2200 1.6046 0.4666
2.7225 0.0146 2300 3.3510 0.5376
2.2919 0.0152 2400 3.3149 0.5376
1.729 0.0158 2500 2.1687 0.5376
2.5072 0.0165 2600 2.9068 0.5376
1.9138 0.0171 2700 1.4200 0.4624
1.4881 0.0177 2800 2.2129 0.4631
2.031 0.0184 2900 2.2580 0.5370
1.998 0.0190 3000 2.2149 0.5374

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0+cu126
  • Datasets 3.3.2
  • Tokenizers 0.21.0
Downloads last month
2
Safetensors
Model size
14B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for EndLessTime/fine_tuned_per_domain_balanced_moe_c10

Finetuned
(8)
this model