Update README.md
Browse files
README.md
CHANGED
|
@@ -46,7 +46,7 @@ Plume is the first LLM trained for Neural Machine Translation with only parallel
|
|
| 46 |
|
| 47 |
In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
|
| 48 |
|
| 49 |
-
For more details regarding the model architecture, the dataset and model interpretability take a look at the paper which is available on [arXiv]().
|
| 50 |
|
| 51 |
## Intended Uses and Limitations
|
| 52 |
|
|
@@ -114,7 +114,13 @@ Below are the evaluation results on Flores-200 and NTREX for supervised MT direc
|
|
| 114 |
## Citation
|
| 115 |
|
| 116 |
```bibtex
|
| 117 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 118 |
```
|
| 119 |
|
| 120 |
## Additional information
|
|
|
|
| 46 |
|
| 47 |
In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
|
| 48 |
|
| 49 |
+
For more details regarding the model architecture, the dataset and model interpretability take a look at the paper which is available on [arXiv](https://arxiv.org/abs/2406.09140).
|
| 50 |
|
| 51 |
## Intended Uses and Limitations
|
| 52 |
|
|
|
|
| 114 |
## Citation
|
| 115 |
|
| 116 |
```bibtex
|
| 117 |
+
@misc{gilabert2024investigating,
|
| 118 |
+
title={Investigating the translation capabilities of Large Language Models trained on parallel data only},
|
| 119 |
+
author={Javier García Gilabert and Carlos Escolano and Aleix Sant Savall and Francesca De Luca Fornaciari and Audrey Mash and Xixian Liao and Maite Melero},
|
| 120 |
+
year={2024},
|
| 121 |
+
eprint={2406.09140},
|
| 122 |
+
archivePrefix={arXiv}
|
| 123 |
+
}
|
| 124 |
```
|
| 125 |
|
| 126 |
## Additional information
|