Malaysian Llama-3.2 3B-Instruct v0.2

Continue finetuning meta-llama/Llama-3.2-3B-Instruct on highly curated 1.2B tokens Malaysian instruction.

Improvement

  1. 128k context length.
  2. Support respond in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
  3. Able to code in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
  4. Multi-turn Malaysian context such as related to Malaysian Legislation, politics, religions and languages.
  5. Standard RAG.

MalayMMLU

                             Model   Accuracy   shot by_letter        category
0  Malaysian-Llama-3.2-3B-Instruct  57.552190  0shot      True            STEM
1  Malaysian-Llama-3.2-3B-Instruct  59.605598  0shot      True        Language
2  Malaysian-Llama-3.2-3B-Instruct  58.065915  0shot      True  Social science
3  Malaysian-Llama-3.2-3B-Instruct  57.303910  0shot      True          Others
4  Malaysian-Llama-3.2-3B-Instruct  60.250284  0shot      True      Humanities
{'Social science': 6918, 'Language': 6288, 'Humanities': 4395, 'Others': 4169, 'STEM': 2443}
Model : Malaysian-Llama-3.2-3B-Instruct
Metric : first
Shot : 0shot
average accuracy 58.67922190558791
accuracy for STEM 57.55218993041342
accuracy for Language 59.605597964376585
accuracy for Social science 58.06591500433651
accuracy for Others 57.30390981050611
accuracy for Humanities 60.250284414106936

Training session

Finetune on mesolitica/Malaysian-SFT to make the model understand Malaysian context.

How we train

  1. LoRA on ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"].
  2. 256 Rank with alpha 512, or alpha of 2.0
  3. Multipacking with proper SDPA causal masking to prevent document contamination and also make sure proper position ids.
  4. Forked CCE loss for LoRA lm_head to reduce memory consumption.

Source code at https://github.com/malaysia-ai/cooking/tree/main/llama/sft

Downloads last month
129
Safetensors
Model size
3.61B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mesolitica/Malaysian-Llama-3.2-3B-Instruct-v0.2

Quantizations
2 models

Dataset used to train mesolitica/Malaysian-Llama-3.2-3B-Instruct-v0.2