Qwen3-30B-A3B-alpaca-th-52k-dolly-th-15k-wangchan-instruct

This model is a fine-tuned version of Qwen/Qwen3-30B-A3B on the alpaca-th-52k, the dolly-th-15k and the wangchan-instruct datasets. It achieves the following results on the evaluation set:

  • Loss: 0.6631

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 64
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 1024
  • total_eval_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
0.9351 0.1149 10 0.9997
0.8087 0.2299 20 0.8204
0.7724 0.3448 30 0.7787
0.7386 0.4598 40 0.7544
0.7351 0.5747 50 0.7382
0.7431 0.6897 60 0.7254
0.7183 0.8046 70 0.7151
0.711 0.9195 80 0.7065
0.6909 1.0345 90 0.6995
0.6893 1.1494 100 0.6939
0.6796 1.2644 110 0.6874
0.65 1.3793 120 0.6812
0.6615 1.4943 130 0.6775
0.6555 1.6092 140 0.6739
0.6522 1.7241 150 0.6713
0.6545 1.8391 160 0.6687
0.648 1.9540 170 0.6668
0.6285 2.0690 180 0.6663
0.6652 2.1839 190 0.6655
0.6307 2.2989 200 0.6647
0.6383 2.4138 210 0.6641
0.6394 2.5287 220 0.6636
0.632 2.6437 230 0.6632
0.6416 2.7586 240 0.6631
0.6228 2.8736 250 0.6631
0.6316 2.9885 260 0.6630

Framework versions

  • PEFT 0.15.2
  • Transformers 4.52.3
  • Pytorch 2.7.0+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
9
Safetensors
Model size
30.5B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for airesearch/Qwen3-30B-A3B-alpaca-th-52k-dolly-th-15k-wangchan-instruct

Finetuned
Qwen/Qwen3-30B-A3B
Adapter
(5)
this model