NLLB-200 Fine-tuned for English-Tamazight Translation
A fine-tuned version of NLLB-200-distilled-600M for English ↔ Tamazight (Kabyle Latin script) translation, trained on a comprehensive dictionary dataset.
Model Description
This model is a fine-tuned version of facebook/nllb-200-distilled-600M specifically adapted for English-Tamazight translation. It was trained on ~9,000 translation pairs from a curated dictionary dataset containing vocabulary, verb conjugations, country names, and cultural phrases.
- Developed by: Abdeljalil Ounaceur
- Model type: Sequence-to-sequence transformer (fine-tuned NLLB)
- Language(s): English (en), Tamazight/Kabyle Latin script (kab_Latn)
- License: CC-BY-NC-4.0
- Finetuned from: facebook/nllb-200-distilled-600M
Intended Uses
Direct Use
- English to Tamazight translation for dictionary terms and basic phrases
- Tamazight to English translation
- Research in Berber language NLP
- Educational applications for Tamazight language learning
Limitations
- Experimental model: Mixed performance with improvements on dictionary terms but some degradation on general text
- Domain specificity: Optimized for dictionary-style translations rather than natural conversation
- Language variant: Some outputs may shift between Kabyle and Tachelhit variants
- Catastrophic forgetting: Some original NLLB capabilities were lost during fine-tuning
How to Use
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load the model
tokenizer = AutoTokenizer.from_pretrained("Abdeljalil-Ounaceur/nllb-tamazight-souss")
model = AutoModelForSeq2SeqLM.from_pretrained("Abdeljalil-Ounaceur/nllb-tamazight-souss")
# Translate English to Tamazight
def translate(text, src_lang="eng_Latn", tgt_lang="kab_Latn"):
inputs = tokenizer(text, return_tensors="pt")
# Get target language token
forced_bos_token_id = tokenizer.convert_tokens_to_ids(tgt_lang)
generated_tokens = model.generate(
**inputs,
forced_bos_token_id=forced_bos_token_id,
max_length=50,
num_beams=4,
early_stopping=True
)
return tokenizer.decode(generated_tokens[0], skip_special_tokens=True)
# Example usage
print(translate("house")) # Expected: tamdint
print(translate("water")) # Expected: aman
- Downloads last month
- 21
Model tree for Abdeljalil-Ounaceur/nllb-tamazight-souss
Base model
facebook/nllb-200-distilled-600M