Malaysian Llama 3.1 70B Instruct

Continue finetuning https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct on highly curated 1.5B tokens Malaysian instruction dataset.

Improvement

Support respond in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
Able to code in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
Multi-turn Malaysian context such as related to Malaysian Legislation, politics, religions and languages.

Training session

Finetune on mesolitica/Malaysian-SFT to make the model understand Malaysian context.

How we train

LoRA on ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"].
128 Rank with alpha 256, or alpha of 2.0
Multipacking 8192 context length with proper SDPA causal masking to prevent document contamination and also make sure proper position ids.
Chunk CCE loss for LoRA.
WanDB at https://wandb.ai/huseinzol05/lora-embedding-128-llama3.1-70b-malaysian-8k?nw=nwuserhuseinzol05

Source code at https://github.com/mesolitica/malaya/tree/master/session/llama3

Benchmark

MalayMMLU

Probability next tokens

Based on 0-shot official MalayMMLU First token accuracy,

                              Model   Accuracy   shot by_letter        category
0  Malaysian-Llama-3.1-70B-Instruct  75.890299  0shot      True            STEM
1  Malaysian-Llama-3.1-70B-Instruct  75.540712  0shot      True        Language
2  Malaysian-Llama-3.1-70B-Instruct  72.260769  0shot      True  Social science
3  Malaysian-Llama-3.1-70B-Instruct  71.863756  0shot      True          Others
4  Malaysian-Llama-3.1-70B-Instruct  78.202503  0shot      True      Humanities
{'Social science': 6918, 'Language': 6288, 'Humanities': 4395, 'Others': 4169, 'STEM': 2443}
Model : Malaysian-Llama-3.1-70B-Instruct
Metric : first
Shot : 0shot
average accuracy 74.48891091562383
accuracy for STEM 75.89029881293492
accuracy for Language 75.54071246819338
accuracy for Social science 72.26076900838393
accuracy for Others 71.86375629647398
accuracy for Humanities 78.20250284414107

While the original model,

                    Model   Accuracy   shot by_letter        category
0  Llama-3.1-70B-Instruct  78.919361  0shot      True            STEM
1  Llama-3.1-70B-Instruct  78.769084  0shot      True        Language
2  Llama-3.1-70B-Instruct  77.262215  0shot      True  Social science
3  Llama-3.1-70B-Instruct  75.269849  0shot      True          Others
4  Llama-3.1-70B-Instruct  82.571104  0shot      True      Humanities
{'Social science': 6918, 'Language': 6288, 'Humanities': 4395, 'Others': 4169, 'STEM': 2443}
Model : Llama-3.1-70B-Instruct
Metric : first
Shot : 0shot
average accuracy 78.44133316813281
accuracy for STEM 78.9193614408514
accuracy for Language 78.76908396946564
accuracy for Social science 77.26221451286499
accuracy for Others 75.26984888462461
accuracy for Humanities 82.57110352673493

First token match using vLLM

Based on 0-shot exact first token match using vLLM Guided Decoding,

                              Model   Accuracy  shot        category
0  Malaysian-Llama-3.1-70B-Instruct  68.686042     0            STEM
1  Malaysian-Llama-3.1-70B-Instruct  69.354326     0        Language
2  Malaysian-Llama-3.1-70B-Instruct  67.620700     0  Social science
3  Malaysian-Llama-3.1-70B-Instruct  65.915088     0          Others
4  Malaysian-Llama-3.1-70B-Instruct  69.897611     0      Humanities
Model : Malaysian-Llama-3.1-70B-Instruct
Metric : full
Shot : 0
average accuracy 68.29802172386735
accuracy for STEM 68.68604175194433
accuracy for Language 69.35432569974554
accuracy for Social science 67.62069962416884
accuracy for Others 65.91508755097145
accuracy for Humanities 69.89761092150171

While the original model,

                    Model   Accuracy  shot        category
0  Llama-3.1-70B-Instruct  76.668031     0            STEM
1  Llama-3.1-70B-Instruct  77.162850     0        Language
2  Llama-3.1-70B-Instruct  74.906042     0  Social science
3  Llama-3.1-70B-Instruct  72.655313     0          Others
4  Llama-3.1-70B-Instruct  78.930603     0      Humanities
Model : Llama-3.1-70B-Instruct
Metric : full
Shot : 0
average accuracy 76.01288563994548
accuracy for STEM 76.66803110929186
accuracy for Language 77.16284987277355
accuracy for Social science 74.90604220873085
accuracy for Others 72.65531302470617
accuracy for Humanities 78.93060295790671

Acknowledgement

Special thanks to https://www.sns.com.my and Nvidia for 8x H100 node!

mesolitica
/

Malaysian-Llama-3.1-70B-Instruct