localenlp-en-wol
Fine-tuned MarianMT model for English-to-Wolof translation.
Model Card for LOCALENLP/english-wolof
This is a machine translation model for English β Wolof, developed by the LOCALENLP organization.
It is based on the pretrained Helsinki-NLP/opus-mt-en-mul
MarianMT model and fine-tuned on a custom parallel corpus of ~84k sentence pairs.
Model Details
Model Description
- Developed by: LOCALENLP
- Funded by [optional]: N/A
- Shared by: LOCALENLP
- Model type: Seq2Seq Transformer (MarianMT)
- Languages: English β Wolof
- License: MIT
- Finetuned from model: Helsinki-NLP/opus-mt-en-mul
Model Sources
- Repository: https://huggingface.co/LOCALENLP/english-wolof
- Demo [optional]: To be integrated in Gradio / Web app
Uses
Direct Use
- Translate English text into Wolof for research, education, and communication.
- Useful for low-resource NLP tasks, digital content creation, and cultural preservation.
Downstream Use
- Can be integrated into translation apps, chatbots, and education platforms.
- Serves as a base for further fine-tuning on domain-specific Wolof corpora.
Out-of-Scope Use
- Suitable for legal and medical translations (e.g., contracts, prescriptions, medical records).
- Mistranslations may occur, like any automated system.
- Review recommended as the model can sometimes mistranslate.
Bias, Risks, and Limitations
- Training data is from a custom collection of parallel sentences (~84k pairs).
- Some informal or culturally nuanced expressions may not be accurately translated.
- Wolof spelling and grammar variation (Latin script) may lead to inconsistencies.
- Model may underperform on domain-specific or long, complex texts.
Recommendations
- Use human post-editing for high-stakes use cases.
- Evaluate performance on your target domain before deployment.
How to Get Started with the Model
from transformers import MarianTokenizer, AutoModelForSeq2SeqLM
model_name = "LOCALENLP/english-wolof"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
text = "Good evening, how was your day?"
inputs = tokenizer(">>wol<< " + text, return_tensors="pt", padding=True, truncation=True)
outputs = model.generate(**inputs, max_length=512, num_beams=4)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("English:", text)
print("Wolof:", translation)
- Downloads last month
- 53
Spaces using LocaleNLP/eng_wolof 2
Evaluation results
- BLEU on English-Wolof Custom Datasetself-reported76.120