localenlp-en-wol

Fine-tuned MarianMT model for English-to-Wolof translation.

Model Card for LOCALENLP/english-wolof

This is a machine translation model for English β†’ Wolof, developed by the LOCALENLP organization.
It is based on the pretrained Helsinki-NLP/opus-mt-en-mul MarianMT model and fine-tuned on a custom parallel corpus of ~84k sentence pairs.


Model Details

Model Description

  • Developed by: LOCALENLP
  • Funded by [optional]: N/A
  • Shared by: LOCALENLP
  • Model type: Seq2Seq Transformer (MarianMT)
  • Languages: English β†’ Wolof
  • License: MIT
  • Finetuned from model: Helsinki-NLP/opus-mt-en-mul

Model Sources


Uses

Direct Use

  • Translate English text into Wolof for research, education, and communication.
  • Useful for low-resource NLP tasks, digital content creation, and cultural preservation.

Downstream Use

  • Can be integrated into translation apps, chatbots, and education platforms.
  • Serves as a base for further fine-tuning on domain-specific Wolof corpora.

Out-of-Scope Use

  • Suitable for legal and medical translations (e.g., contracts, prescriptions, medical records).
  • Mistranslations may occur, like any automated system.
  • Review recommended as the model can sometimes mistranslate.

Bias, Risks, and Limitations

  • Training data is from a custom collection of parallel sentences (~84k pairs).
  • Some informal or culturally nuanced expressions may not be accurately translated.
  • Wolof spelling and grammar variation (Latin script) may lead to inconsistencies.
  • Model may underperform on domain-specific or long, complex texts.

Recommendations

  • Use human post-editing for high-stakes use cases.
  • Evaluate performance on your target domain before deployment.

How to Get Started with the Model

from transformers import MarianTokenizer, AutoModelForSeq2SeqLM

model_name = "LOCALENLP/english-wolof"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

text = "Good evening, how was your day?"
inputs = tokenizer(">>wol<< " + text, return_tensors="pt", padding=True, truncation=True)
outputs = model.generate(**inputs, max_length=512, num_beams=4)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("English:", text)
print("Wolof:", translation)
Downloads last month
53
Safetensors
Model size
77M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using LocaleNLP/eng_wolof 2

Evaluation results