Qwen3-32B-alpaca-th-52k-dolly-th-15k-wangchan-instruct

This model is a fine-tuned version of Qwen/Qwen3-32B on the alpaca-th-52k, the dolly-th-15k and the wangchan-instruct datasets. It achieves the following results on the evaluation set:

  • Loss: 0.6417

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 32
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 512
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
0.9564 0.0575 10 1.0507
0.806 0.1149 20 0.8268
0.7551 0.1724 30 0.7598
0.7158 0.2299 40 0.7396
0.7217 0.2874 50 0.7252
0.7078 0.3448 60 0.7130
0.6719 0.4023 70 0.7029
0.6855 0.4598 80 0.6964
0.7328 0.5172 90 0.6907
0.6663 0.5747 100 0.6848
0.7049 0.6322 110 0.6792
0.6772 0.6897 120 0.6751
0.687 0.7471 130 0.6721
0.6786 0.8046 140 0.6700
0.6389 0.8621 150 0.6672
0.6673 0.9195 160 0.6649
0.6711 0.9770 170 0.6633
0.6614 1.0345 180 0.6615
0.6219 1.0920 190 0.6602
0.6542 1.1494 200 0.6587
0.6596 1.2069 210 0.6572
0.6526 1.2644 220 0.6567
0.657 1.3218 230 0.6551
0.6124 1.3793 240 0.6537
0.6489 1.4368 250 0.6526
0.614 1.4943 260 0.6515
0.656 1.5517 270 0.6504
0.6255 1.6092 280 0.6492
0.6419 1.6667 290 0.6486
0.6275 1.7241 300 0.6473
0.6324 1.7816 310 0.6466
0.6334 1.8391 320 0.6461
0.6213 1.8966 330 0.6452
0.6269 1.9540 340 0.6443
0.6408 2.0115 350 0.6437
0.6213 2.0690 360 0.6441
0.6146 2.1264 370 0.6440
0.6572 2.1839 380 0.6438
0.6264 2.2414 390 0.6435
0.6051 2.2989 400 0.6434
0.5983 2.3563 410 0.6429
0.6388 2.4138 420 0.6425
0.6227 2.4713 430 0.6425
0.6335 2.5287 440 0.6421
0.6247 2.5862 450 0.6420
0.6404 2.6437 460 0.6418
0.6218 2.7011 470 0.6418
0.6368 2.7586 480 0.6417
0.6191 2.8161 490 0.6417
0.6234 2.8736 500 0.6417
0.6079 2.9310 510 0.6417
0.6243 2.9885 520 0.6417

Framework versions

  • PEFT 0.15.2
  • Transformers 4.52.3
  • Pytorch 2.7.0+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
15
Safetensors
Model size
32.8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for airesearch/Qwen3-32B-alpaca-th-52k-dolly-th-15k-wangchan-instruct

Base model

Qwen/Qwen3-32B
Adapter
(31)
this model