BOTS-LM
This collection hosts the models and datasets released as part of BOTS-LM, the Bilingual Open Tswana Suite of Language Models.
- Paper • 2408.02239 • Published
OxxoCodes/Pula-8B-v0.1
Text Generation • Updated • 27 • 1Note The current (but not final) version of Pula-8B. Currently recommended for most use cases. Excels in translation and reasoning tasks.
OxxoCodes/Pula-XLMR-large-v0.1
Fill-Mask • Updated • 4Note The current (but not final) version of Pula-XLMR-Large. Currently recommended for most use cases.
OxxoCodes/Medupe
Viewer • Updated • 976k • 40Note Instruction-tuning dataset consisting of existing instruction datasets, datasets altered to a chat form, translation data, GPT-4o and Gemini-1.5-Pro translated subsets of popular instruction datasets, and purely synthetic instruction data.
OxxoCodes/Marothodi
Viewer • Updated • 152k • 45Note Contains raw webtext and other documents written in Setswana or code-switched Setswana+English
OxxoCodes/mmlu-tsn
Viewer • Updated • 14k • 118Note The entirety of the MMLU test split, translated into Setswana with GPT-4o
OxxoCodes/gsm8k-tsn
Viewer • Updated • 1.32k • 31Note The entirety of the GSM8K test split, translated into Setswana with Gemini 1.5 Pro.