Token Classification
Transformers
Safetensors
xlm-roberta

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

Evaluation Comparison

Intent Detection (Accuracy)

In-language training

Model eng amh ewe hau ibo kin lin lug orm sna sot swa twi wol xho yor zul AVG
mT5-Large 80.5 91.5 77.3 94.6 92.9 83.7 91.3 83.3 73.3 92.6 80.2 95.8 85.3 91.6 95.8 90.9 82.4 87.7±4.1
AfriTeVa V2 (T5) 81.6 93.2 84.4 98.9 95.7 87.8 91.6 86.8 86.6 94.6 85.7 96.8 87.1 94.0 97.3 97.0 89.2 91.7±2.7
NLLB LLM2Vec 88.4 94.2 87.8 98.3 96.8 89.2 95.2 93.2 86.2 96.1 87.3 97.4 93.5 95.6 97.5 97.3 89.1 93.4±2.3
XLM-RoBERTa 83.5 92.9 77.9 96.0 88.8 69.6 90.5 78.9 75.0 83.8 76.0 96.7 79.5 90.2 89.6 92.6 74.7 84.5±4.9
AfriBERTa V2 74.2 91.2 78.3 98.2 93.8 83.1 91.0 83.8 78.8 89.5 81.9 96.0 83.2 92.3 94.4 95.0 86.7 88.6±3.5
AfroXLMR 84.1 95.3 84.6 98.3 96.0 88.2 93.3 85.2 88.3 95.3 85.5 97.8 88.8 95.8 97.3 96.1 89.0 92.2±3.0
AfroXLMR 76L 84.5 95.5 90.4 98.7 96.3 89.4 94.6 91.3 88.3 95.1 86.8 98.1 93.6 96.2 96.9 97.7 89.8 93.7±2.1

Multi-lingual training

Model eng amh ewe hau ibo kin lin lug orm sna sot swa twi wol xho yor zul AVG
AfroXLMR-large-76L-Injongo-intent 89.0 96.0 92.6 99.2 96.6 87.7 95.9 92.3 92.9 96.5 87.6 97.8 94.2 97.1 97.3 97.9 89.2 94.4±2.0

Slot Filling (F1)

In-language training

Model eng amh ewe hau ibo kin lin lug orm sna sot swa twi wol xho yor zul AVG
mT5-Large 73.7 80.9 71.6 89.4 80.5 74.2 82.6 78.9 72.1 81.1 74.7 88.1 79.0 76.9 88.4 78.9 68.3 79.1±3.7
AfriTeVa V2 (T5) 73.6 80.9 74.5 93.8 79.9 76.6 87.1 85.2 79.0 82.1 77.5 88.9 84.0 79.0 90.0 87.2 71.2 82.3±3.3
NLLB LLM2Vec 74.6 82.4 80.5 93.6 78.1 70.1 84.8 86.6 80.8 81.4 74.8 85.7 85.7 78.3 88.0 85.0 78.3 82.1±3.1
XLM-RoBERTa 77.9 84.8 79.9 93.9 76.6 69.3 86.3 83.8 83.8 79.3 71.7 88.7 84.2 79.3 89.1 83.9 79.4 82.1±3.5
AfriBERTa V2 70.7 82.2 77.9 93.7 78.3 73.8 84.4 84.1 81.0 81.8 73.5 87.6 81.9 78.3 88.5 86.2 79.6 82.1±2.9
AfroXLMR 79.0 86.2 81.6 95.1 82.0 76.3 87.1 88.5 84.9 84.9 77.5 90.2 85.5 81.7 91.1 87.3 82.5 85.2±2.7
AfroXLMR 76L 78.7 86.3 84.5 94.3 81.9 76.7 88.0 88.8 85.5 84.9 77.4 90.2 89.8 81.3 90.5 88.1 81.3 85.6±2.7

Multi-lingual training

Model eng amh ewe hau ibo kin lin lug orm sna sot swa twi wol xho yor zul AVG
AfroXLMR-large-76L-Injongo-slot 82.4 88.2 87.0 96.3 84.0 79.3 90.3 89.2 87.2 86.1 80.4 90.5 90.3 83.3 91.8 90.2 83.3 87.3±2.4

Language Codes

  • eng: English
  • amh: Amharic
  • ewe: Ewe
  • hau: Hausa
  • ibo: Igbo
  • kin: Kinyarwanda
  • lin: Lingala
  • lug: Luganda
  • orm: Oromo
  • sna: Shona
  • sot: Sesotho
  • swa: Swahili
  • twi: Twi
  • wol: Wolof
  • xho: Xhosa
  • yor: Yoruba
  • zul: Zulu

Notes

  • Bold values indicate the best performing scores in each category
  • The highlighted models (AfroXLMR 76L) show the top overall performance
  • Multi-lingual training generally outperforms in-language training
  • Standard deviations are reported alongside average scores
  • AVG doest not include english results.

Citation

@misc{yu2025injongo,
      title={INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages}, 
      author={Hao Yu and Jesujoba O. Alabi and Andiswa Bukula and Jian Yun Zhuang and En-Shiun Annie Lee and Tadesse Kebede Guge and Israel Abebe Azime and Happy Buzaaba and Blessing Kudzaishe Sibanda and Godson K. Kalipe and Jonathan Mukiibi and Salomon Kabongo Kabenamualu and Mmasibidi Setaka and Lolwethu Ndolela and Nkiruka Odu and Rooweither Mabuya and Shamsuddeen Hassan Muhammad and Salomey Osei and Sokhar Samb and Juliet W. Murage and Dietrich Klakow and David Ifeoluwa Adelani},
      year={2025},
      eprint={2502.09814},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.09814}, 
}
@misc{adelani2023sib200,
      title={SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects}, 
      author={David Ifeoluwa Adelani and Hannah Liu and Xiaoyu Shen and Nikita Vassilyev and Jesujoba O. Alabi and Yanke Mao and Haonan Gao and Annie En-Shiun Lee},
      year={2023},
      eprint={2309.07445},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
5
Safetensors
Model size
559M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for McGill-NLP/AfroXLMR-large-76L-Injongo-slot

Finetuned
(14)
this model

Dataset used to train McGill-NLP/AfroXLMR-large-76L-Injongo-slot

Collection including McGill-NLP/AfroXLMR-large-76L-Injongo-slot