YAML Metadata
Error:
"datasets[1]" with value "ParaCrawl 99k" is not valid. If possible, use a dataset id from https://hf.co/datasets.
YAML Metadata
Error:
"datasets[5]" with value "Biomedical Domain Corpora" is not valid. If possible, use a dataset id from https://hf.co/datasets.
Introduction
This repository brings an implementation of T5 for translation in PT-EN tasks using a modest hardware setup. We propose some changes in tokenizator and post-processing that improves the result and used a Portuguese pretrained model for the translation. You can collect more informations in our repository. Also, check our paper!
Usage
Just follow "Use in Transformers" instructions. It is necessary to add a few words before to define the task to T5.
You can also create a pipeline for it. An example with the phrase " Eu gosto de comer arroz" is:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
tokenizer = AutoTokenizer.from_pretrained("unicamp-dl/translation-pt-en-t5")
model = AutoModelForSeq2SeqLM.from_pretrained("unicamp-dl/translation-pt-en-t5")
pten_pipeline = pipeline('text2text-generation', model=model, tokenizer=tokenizer)
pten_pipeline("translate Portuguese to English: Eu gosto de comer arroz.")
Citation
@inproceedings{lopes-etal-2020-lite,
title = "Lite Training Strategies for {P}ortuguese-{E}nglish and {E}nglish-{P}ortuguese Translation",
author = "Lopes, Alexandre and
Nogueira, Rodrigo and
Lotufo, Roberto and
Pedrini, Helio",
booktitle = "Proceedings of the Fifth Conference on Machine Translation",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.wmt-1.90",
pages = "833--840",
}
- Downloads last month
- 916
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.