Bambara FastText Embeddings

Model Description

This model provides FastText word embeddings for the Bambara language (Bamanankan), a Mande language spoken primarily in Mali. The embeddings capture semantic relationships between Bambara words and enable various NLP tasks for this low-resource African language.

Model Type: FastText Word Embeddings
Language: Bambara (bm)
License: Apache 2.0

Model Details

Model Architecture

  • Algorithm: FastText with subword information
  • Vector Dimension: 300
  • Vocabulary Size: 9,973 unique Bambara words
  • Training Method: Skip-gram with negative sampling
  • Subword Information: Character n-grams (enables handling of out-of-vocabulary words)

Training Data

The model was trained on Bambara text corpora, building upon the work of David Ifeoluwa Adelani's research on African language embeddings.

Intended Use

This model is designed for:

  • Semantic similarity tasks in Bambara
  • Information retrieval for Bambara documents
  • Cross-lingual research involving Bambara
  • Cultural preservation and digital humanities projects
  • Educational applications for Bambara language learning
  • Foundation for downstream NLP tasks in Bambara

Usage

  Coming soon
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support