MaLA Corpus for Massive Language Adaptation of Large Language Models https://mala-lm.github.io
MaLA-LM
community
AI & ML interests
NLP & LLM
Recent Activity
View all activity
Organization Card
Welcome to MaLA-LM (Massive Language Adaptation of Large Language Models)! 🌍
MaLA-LM focuses on adapting large language models to support hundreds of languages, including many underrepresented ones. Our models are multilingual, scalable, and optimized for diverse linguistic tasks.
Featured 🗣️
Check out our multilingual LLM collections, featuring models trained to handle 500+ languages, ideal for global, multilingual applications.
Dive into the collections: EMMA-500 | MaLA corpus | MaLA-500
Join our Discord server 👋
https://discord.com/invite/F5mEb7U6we
Happy building! 🚀
Enhancing massively multilingual adaptation of LLMs on 500+ languages https://mala-lm.github.io
-
Massively Multilingual Adaptation of Large Language Models Using Bilingual Translation Data
Paper • 2506.00469 • Published • 2 -
MaLA-LM/emma-500-llama3-8b-mono
Text Generation • 8B • Updated • 41 -
MaLA-LM/emma-500-llama3-8b-bi
Text Generation • 8B • Updated • 2.25k -
MaLA-LM/emma-500-llama3.1-8b-mono
Text Generation • 8B • Updated • 40
MaLA Corpus for Massive Language Adaptation of Large Language Models https://mala-lm.github.io
Enhancing massively multilingual adaptation of LLMs on 500+ languages https://mala-lm.github.io
-
Massively Multilingual Adaptation of Large Language Models Using Bilingual Translation Data
Paper • 2506.00469 • Published • 2 -
MaLA-LM/emma-500-llama3-8b-mono
Text Generation • 8B • Updated • 41 -
MaLA-LM/emma-500-llama3-8b-bi
Text Generation • 8B • Updated • 2.25k -
MaLA-LM/emma-500-llama3.1-8b-mono
Text Generation • 8B • Updated • 40
models
59

MaLA-LM/lucky52-bloom-7b1-no-32
Text Generation
•
8B
•
Updated
•
14

MaLA-LM/emma-500-llama3.1-8b-bi
Text Generation
•
8B
•
Updated
•
380

MaLA-LM/emma-500-llama3-8b-bi
Text Generation
•
8B
•
Updated
•
2.25k

MaLA-LM/emma-500-llama3-8b-mono
Text Generation
•
8B
•
Updated
•
41

MaLA-LM/emma-500-llama3.1-8b-mono
Text Generation
•
8B
•
Updated
•
40

MaLA-LM/lucky52-bloom-7b1-no-3
Text Generation
•
8B
•
Updated
•
6

MaLA-LM/lucky52-bloom-7b1-no-2
Text Generation
•
8B
•
Updated
•
10

MaLA-LM/lucky52-bloom-7b1-no-4
Text Generation
•
8B
•
Updated
•
8

MaLA-LM/lucky52-bloom-7b1-no-5
Text Generation
•
8B
•
Updated
•
6

MaLA-LM/lucky52-bloom-7b1-no-6
Text Generation
•
8B
•
Updated
•
6
datasets
13
MaLA-LM/mala-bilingual-translation-corpus
Viewer
•
Updated
•
14.4B
•
1.19k
•
5
MaLA-LM/mala-opus-dedup-2410
Viewer
•
Updated
•
44.3B
•
3.61k
•
2
MaLA-LM/mala-code-reasoning-v2
Viewer
•
Updated
•
89.7M
•
81
•
2
MaLA-LM/mala-code-reasoning
Viewer
•
Updated
•
44.9M
•
52
•
1
MaLA-LM/mala-monolingual-split
Viewer
•
Updated
•
538M
•
3.7k
•
2
MaLA-LM/mala-monolingual-filter
Viewer
•
Updated
•
1.42B
•
12.9k
•
2
MaLA-LM/mala-monolingual-integration
Viewer
•
Updated
•
1.14B
•
1.19k
•
2
MaLA-LM/mala-monolingual-dedup
Viewer
•
Updated
•
969M
•
13.1k
•
2
MaLA-LM/mala-opus-dedup-2410-sample
Viewer
•
Updated
•
6.48B
•
327
MaLA-LM/mala-opus-dedup-shuffle-2410
Preview
•
Updated
•
1.57k