MaLA Corpus for Massive Language Adaptation of Large Language Models https://mala-lm.github.io
MaLA-LM
community
AI & ML interests
NLP & LLM
Recent Activity
View all activity
Organization Card
Welcome to MaLA-LM (Massive Language Adaptation of Large Language Models)! 🌍
MaLA-LM focuses on adapting large language models to support hundreds of languages, including many underrepresented ones. Our models are multilingual, scalable, and optimized for diverse linguistic tasks.
Featured 🗣️
Check out our multilingual LLM collections, featuring models trained to handle 500+ languages, ideal for global, multilingual applications.
Dive into the collections: EMMA-500 | MaLA corpus | MaLA-500
Join our Discord server 👋
https://discord.com/invite/F5mEb7U6we
Happy building! 🚀
Enhancing massively multilingual adaptation of LLMs on 500+ languages https://mala-lm.github.io
-
Massively Multilingual Adaptation of Large Language Models Using Bilingual Translation Data
Paper • 2506.00469 • Published • 2 -
MaLA-LM/emma-500-llama3-8b-mono
Text Generation • 8B • Updated • 31 -
MaLA-LM/emma-500-llama3-8b-bi
Text Generation • 8B • Updated • 53 -
MaLA-LM/emma-500-llama3.1-8b-mono
Text Generation • 8B • Updated • 39
MaLA Corpus for Massive Language Adaptation of Large Language Models https://mala-lm.github.io
Enhancing massively multilingual adaptation of LLMs on 500+ languages https://mala-lm.github.io
-
Massively Multilingual Adaptation of Large Language Models Using Bilingual Translation Data
Paper • 2506.00469 • Published • 2 -
MaLA-LM/emma-500-llama3-8b-mono
Text Generation • 8B • Updated • 31 -
MaLA-LM/emma-500-llama3-8b-bi
Text Generation • 8B • Updated • 53 -
MaLA-LM/emma-500-llama3.1-8b-mono
Text Generation • 8B • Updated • 39
models
59

MaLA-LM/emma-500-llama3.1-8b-bi
Text Generation
•
8B
•
Updated
•
80

MaLA-LM/emma-500-llama3-8b-bi
Text Generation
•
8B
•
Updated
•
53

MaLA-LM/emma-500-llama3-8b-mono
Text Generation
•
8B
•
Updated
•
31

MaLA-LM/emma-500-llama3.1-8b-mono
Text Generation
•
8B
•
Updated
•
39

MaLA-LM/lucky52-bloom-7b1-no-3
Text Generation
•
8B
•
Updated
•
22

MaLA-LM/lucky52-bloom-7b1-no-2
Text Generation
•
8B
•
Updated
•
70

MaLA-LM/lucky52-bloom-7b1-no-4
Text Generation
•
8B
•
Updated
•
56

MaLA-LM/lucky52-bloom-7b1-no-5
Text Generation
•
8B
•
Updated
•
33

MaLA-LM/lucky52-bloom-7b1-no-6
Text Generation
•
8B
•
Updated
•
22

MaLA-LM/lucky52-bloom-7b1-no-8
Text Generation
•
8B
•
Updated
•
28
datasets
13
MaLA-LM/mala-opus-dedup-2410
Viewer
•
Updated
•
44.3B
•
11.6k
•
1
MaLA-LM/mala-code-reasoning-v2
Viewer
•
Updated
•
89.7M
•
261
•
3
MaLA-LM/mala-code-reasoning
Viewer
•
Updated
•
44.9M
•
134
•
1
MaLA-LM/mala-monolingual-split
Viewer
•
Updated
•
538M
•
9.18k
•
2
MaLA-LM/mala-monolingual-filter
Viewer
•
Updated
•
1.42B
•
13.2k
•
2
MaLA-LM/mala-monolingual-integration
Viewer
•
Updated
•
1.14B
•
1.93k
•
2
MaLA-LM/mala-monolingual-dedup
Viewer
•
Updated
•
969M
•
11.5k
•
2
MaLA-LM/mala-bilingual-translation-corpus
Viewer
•
Updated
•
14.4B
•
2.13k
•
5
MaLA-LM/mala-opus-dedup-2410-sample
Viewer
•
Updated
•
6.48B
•
611
MaLA-LM/mala-opus-dedup-shuffle-2410
Preview
•
Updated
•
1.26k