Model Card for Model ID

The model is a result of fine-tuning Mistral-7B-v0.1 on a down stream task, in low resourced setting. It is able to translate English sentences to Zulu and Xhosa sentences.

Model Details

Authors: Pitso Walter Khoboko, Vukosi Marivate, Joseph Sefara
Affiliation: University of Pretoria, Data Science for Social Impact
License: CC-BY-4.0
Base model: mistralai/Mistral-7B-v0.1
Languages: English → Zulu, English → Xhosa
Model type: Causal LLM with prompt-based translation fine-tuning

Model Description

dsfsi/OMT-LR-Mistral7b, model, was fine-tuned for 31 GPU days from base model mistralai/Mistral-7B-v0.1. The model was fine-tuned in efforts to improve translation task for large language model in regard to low resourced morphologically rich African languages using custom prompt.

Developed by: Pitso Walter Khoboko, Vukosi Marivate and Joseph Sefara
Funded by [optional]: University of Pretoria and Data Science For Social Impact
Shared by [optional]: Pitso Walter Khoboko
Model type: Sequence-to-sequence model
Language(s) (NLP): English to Zulu and Xhosa
License: cc-by-4.0
Finetuned from model [optional]: mistralai/Mistral-7B-v0.1

Model Sources [optional]

Repository: https://github.com/PKhoboko/MSc-Thesis
Paper [optional]: https://www.sciencedirect.com/science/article/pii/S2666827025000325
Demo [optional]: [More Information Needed]

Uses

The model can be used to translate Engslih to Zulu and Xhosa. With further improvement it can be used to translate domain specific infromation from English to Zulu and Xhosa, thus it can be used to translate research information that was written in English, in the agriculture industry, to small scale farmers that speak Zulu and Xhosa. Further, it can be used in the Education industry to teach core subjects in native South African langauges thus can improve pupils' performance in these subjects.

Direct Use

You can download the model, dsfsi/OMT-LR-Mistral7b, and prompt it to translate English sentences to Zulu and Xhosa sentences.

Out-of-Scope Use

Translating full documents or complex legal/medical content.
Any politically sensitive, sexually biased, or harmful content generation.

Bias, Risks, and Limitations

Training data includes English intrusions in target languages (Zulu/Xhosa).
May hallucinate or degrade performance on domain-specific or long-form content.
Not tested extensively for dialectal variations or colloquial expressions.

Recommendations

For critical use (e.g., government or education), fine-tune further with clean, domain-specific parallel corpora.
Avoid deploying this model in zero-review production pipelines.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = "dsfsi/OMT-LR-Mistral7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

translator = pipeline("text-generation", model=model, tokenizer=tokenizer)
translator("Translate to Zulu: The cow is eating grass.")

Training Details

Training Data

Note: The above datasets were collected individually and used to create a multilingual dataset having English to Zulu and Xhosa sentences.

Training Procedure

Preprocessing [optional]

Please check out the repo above to get the dataset cleanup and preparation code.

Training Hyperparameters

Training regime:

- peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.05,
        r=16,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules=['k_proj', 'q_proj', 'v_proj', 'o_proj','gate_proj', 'down_proj', 'up_proj']
)

 - TrainingArguments(
        optim="paged_adamw_8bit",
        per_device_train_batch_size=32,
        gradient_accumulation_steps=4,
        log_level="debug",
        save_steps=400,
        logging_steps=10,
        learning_rate=4e-4,
        num_train_epochs=2,
        warmup_steps=100,
        lr_scheduler_type="linear",
)

Evaluation

Testing Data, Factors & Metrics

Testing Data

Metrics

bleu: Used to check if the model is translating Zulu and Xhosa words proprely when compared to the fround truth.
f1:evaluates larger linguistic units such as grammatical chunks and syntactic frames, making it more suitable for languages with complex syntactic structures.
G-Eva: uses embeddings to capture the contextual and semantic similarity between hypothesis and reference translations

Results

Language Pair	BLEU	F1	G-Eva
English → Zulu	20	42	92%
English → Xhosa	14	38	91%

Summary

OMT-LR-Mistral7b fine-tunes Mistral-7B-v0.1 using custom prompt engineering for low-resource African languages, specifically English to Zulu and Xhosa translation. It was trained for 31 GPU days using a multilingual dataset to improve translation accuracy for morphologically rich languages.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware: 2 A100 48GB GPUs
Cloud Provider: Google Cloud Platform
Compute Region: europe-west4-a
Training Time: ~31 GPU days
Carbon Emissions: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

@article{khoboko2025optimizing, title={Optimizing translation for low-resource languages: Efficient fine-tuning with custom prompt engineering in large language models}, author={Khoboko, Pitso Walter and Marivate, Vukosi and Sefara, Joseph}, journal={Machine Learning with Applications}, volume={20}, pages={100649}, year={2025}, publisher={Elsevier} }

APA:

Khoboko, P. W., Marivate, V., & Sefara, J. (2025). Optimizing translation for low-resource languages: Efficient fine-tuning with custom prompt engineering in large language models. Machine Learning with Applications, 20, 100649.

Model Card Authors [optional]

Pitso Walter Khoboko

Model Card Contact

Pitso Walter Khoboko: [email protected]
Vukosi Marivate: [email protected]
Joseph Sefara: [email protected]

dsfsi
/

OMT-LR-Mistral7b

Model Card for Model ID

Model Details

Model Description

Model Sources [optional]

Uses

Direct Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing [optional]

Training Hyperparameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Metrics

Results

Summary

Environmental Impact

Technical Specifications [optional]

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation [optional]

Model Card Authors [optional]

Model Card Contact

Model tree for dsfsi/OMT-LR-Mistral7b

Datasets used to train dsfsi/OMT-LR-Mistral7b