nllb-200-distilled-600M-en2bem
This model is a fine-tuned version of facebook/nllb-200-distilled-600M on the Big-C dataset that we took from the original data. It achieves the following results on the evaluation set:
- Loss: 0.3204
- Bleu: 8.51
- Chrf: 48.32
- Wer: 83.1036
Model description
This model is a translation model that translate Bemba to English. This model is trained on facebook/nllb-200-distilled-600M.
Intended uses & limitations
This model is a English-to-Bemba translation model. This model was used for data augmentation.
Training and evaluation data
This model is trained using the train+val
split split from Big-C Dataset. Meanwhile for evaluation, this model used test
split from Big-C.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Chrf | Wer |
---|---|---|---|---|---|---|
0.2594 | 1.0 | 5240 | 0.3208 | 7.99 | 47.42 | 83.9565 |
0.2469 | 2.0 | 10480 | 0.3169 | 8.08 | 47.92 | 83.4161 |
0.2148 | 3.0 | 15720 | 0.3204 | 8.51 | 48.32 | 83.1036 |
Framework versions
- Transformers 4.47.1
- Pytorch 2.5.1+cu121
- Datasets 3.4.0
- Tokenizers 0.21.0
Citation
@inproceedings{nllb2022,
title = {No Language Left Behind: Scaling Human-Centered Machine Translation},
author = {Costa-jussà, Marta R. and Cross, James and et al.},
booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year = {2022},
publisher = {Association for Computational Linguistics},
url = {https://aclanthology.org/2022.emnlp-main.9}
}
@inproceedings{sikasote-etal-2023-big,
title = "{BIG}-{C}: a Multimodal Multi-Purpose Dataset for {B}emba",
author = "Sikasote, Claytone and
Mukonde, Eunice and
Alam, Md Mahfuz Ibn and
Anastasopoulos, Antonios",
editor = "Rogers, Anna and
Boyd-Graber, Jordan and
Okazaki, Naoaki",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-long.115",
doi = "10.18653/v1/2023.acl-long.115",
pages = "2062--2078",
abstract = "We present BIG-C (Bemba Image Grounded Conversations), a large multimodal dataset for Bemba. While Bemba is the most populous language of Zambia, it exhibits a dearth of resources which render the development of language technologies or language processing research almost impossible. The dataset is comprised of multi-turn dialogues between Bemba speakers based on images, transcribed and translated into English. There are more than 92,000 utterances/sentences, amounting to more than 180 hours of audio data with corresponding transcriptions and English translations. We also provide baselines on speech recognition (ASR), machine translation (MT) and speech translation (ST) tasks, and sketch out other potential future multimodal uses of our dataset. We hope that by making the dataset available to the research community, this work will foster research and encourage collaboration across the language, speech, and vision communities especially for languages outside the {``}traditionally{''} used high-resourced ones. All data and code are publicly available: [\url{https://github.com/csikasote/bigc}](\url{https://github.com/csikasote/bigc}).",
}
Contact
This model was trained by Hazim.
Acknowledgments
Huge thanks to Yasmin Moslem for her supervision, and Habibullah Akbar the founder of Kreasof-AI, for his leadership and support.
- Downloads last month
- 4
Model tree for kreasof-ai/nllb-200-600M-eng2bem
Base model
facebook/nllb-200-distilled-600M