en-toki-mt
This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-ROMANCE on the English - toki pona translation dataset on Tatoeba.
Model description
toki pona is a minimalist constructed language created in 2014 by Sonja Lang. The language features a very small volcabulary (~130 words) and a very simple grammar structure.
Intended uses & limitations
This model aims to translate English to Toki pona.
Training and evaluation data
The training data is acquired from all En-Toki sentence pairs on Tatoeba (~20000 pairs), without any filtering. Since this dataset mostly only includes core words (pu), it may produce inaccurate results when encountering more complex words. The model achieved a BLEU score of 54 on the testing set.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10
- mixed_precision_training: Native AMP
Framework versions
- Transformers 4.20.1
- Pytorch 1.11.0
- Datasets 2.3.2
- Tokenizers 0.12.1
- Downloads last month
- 11
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for ckb/en-toki-mt
Base model
Helsinki-NLP/opus-mt-en-ROMANCE