Malaysian Qwen3-14B

Continue finetuning https://huggingface.co/Qwen/Qwen3-14B on highly curated 1.5B tokens Malaysian instruction dataset.

Improvement

  1. Support respond in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
  2. Able to code in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
  3. Multi-turn Malaysian context such as related to Malaysian Legislation, politics, religions and languages.

Training session

Finetune on mesolitica/Malaysian-SFT to make the model understand Malaysian context.

How we train

  1. LoRA on ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"].
  2. 256 Rank with alpha 512, or alpha of 2.0
  3. Multipacking 8192 context length with proper SDPA causal masking to prevent document contamination and also make sure proper position ids.
  4. Chunk CCE loss for LoRA.
  5. WanDB at https://wandb.ai/huseinzol05/lora-embedding-256-qwen3-14b-malaysian-8k

Source code at https://github.com/mesolitica/malaya/tree/master/session/qwen3

Benchmark

MalayMMLU

Probability next tokens

Based on 0-shot official MalayMMLU First token accuracy,


While the original model,


First token match using vLLM

Based on 0-shot exact first token match using vLLM Guided Decoding,


While the original model,


Acknowledgement

Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!

Downloads last month
18
Safetensors
Model size
14.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mesolitica/Malaysian-Qwen3-14B

Quantizations
2 models

Collection including mesolitica/Malaysian-Qwen3-14B