File size: 1,175 Bytes
1673378 228263d 2ffe138 66b6987 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
---
language:
- 'zh'
- 'en'
tags:
- translation
- game
- cultivation
license: 'cc-by-nc-4.0'
datasets:
- Custom
metrics:
- BLEU
---
This is a finetuned version of Facebook/M2M100.
It has been trained on a parallel corpus on several Chinese video games translations. All of them are from human/fan translations.
Sample generation script :
```from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained(r"path\to\checkpoint")
model = AutoModelForSeq2SeqLM.from_pretrained(r"path\to\checkpoint")
tokenizer.src_lang = "zh"
tokenizer.tgt_lang = "en"
test_string = "地阶上品遁术,施展后便可立于所持之剑上,以极快的速度自由飞行。"
inputs = tokenizer(test_string, return_tensors="pt")
translated_tokens = model.generate(**inputs, num_beams=10, do_sample=True)
translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
print("CH : ", test_string , " // EN : ", translation)```
Translation sample and comparison with Google Translate and DeepL : [here](https://docs.google.com/spreadsheets/d/1J1i9P0nyI9q5-m2iZGSUatt3ZdHSxU8NOp9tJH7wxsk/edit?usp=sharing)
|