language: pt | |
tags: | |
- portuguese | |
- brazil | |
- pt_BR | |
widget: | |
- text: gostei muito dessa <mask> | |
# BR_BERTo | |
Portuguese (Brazil) model for text inference. | |
## Params | |
Trained on a corpus of 6_993_330 sentences. | |
- Vocab size: 150_000 | |
- RobertaForMaskedLM size : 512 | |
- Num train epochs: 3 | |
- Time to train: ~10days (on GCP with a Nvidia T4) | |
I follow the great tutorial from HuggingFace team: | |
[How to train a new language model from scratch using Transformers and Tokenizers](https://huggingface.co/blog/how-to-train) | |
More infor here: | |
[BR_BERTo](https://github.com/rdenadai/BR-BERTo) | |