mbart-translation-DDHH
This model is a fine-tuned version of facebook/mbart-large-50 trained on a parallel English–Spanish corpus of the Universal Declaration of Human Rights. It achieves the following results on the evaluation set:
- Loss: 3.6758
- Bleu: 33.4177
- Gen Len: 25.95
Model description
This model is based on mBART-50, a multilingual sequence-to-sequence model, and has been fine-tuned to improve the quality of translations between English and Spanish for the specific domain of legal/human-rights text. It is designed to produce fluent and accurate sentence-level translations that maintain the formal tone and legal register of the source material.
Intended uses & limitations
Intended use: Automatic English↔Spanish translation of legal or policy-oriented texts, especially those similar in style to the Universal Declaration of Human Rights. Limitations: specialized in one specific domain (the Universal Declaration of Human Rights) and may not generalize well to informal or highly technical text outside this domain.
Training and evaluation data
The model was fine-tuned on a parallel corpus of the Universal Declaration of Human Rights in English and Spanish. The training set included sentence-aligned text segments extracted from publicly available translations of the declaration.
Training procedure
The fine-tuning was conducted on a single GPU using the transformers library from Hugging Face.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5.6e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
---|---|---|---|---|---|
No log | 1.0 | 10 | 3.8374 | 28.2511 | 23.65 |
No log | 2.0 | 20 | 3.6758 | 33.4177 | 25.95 |
Framework versions
- Transformers 4.52.4
- Pytorch 2.6.0+cu124
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- 29
Model tree for macordob/mbart-translation-DDHH
Base model
facebook/mbart-large-50