agentlans/zhtw-en · Hugging Face

Traditional Chinese-to-English Translator

This model translates Traditional Chinese sentences into English.

It is a fine-tuned version of the Helsinki-NLP/opus-mt-zh-en model, trained on the agentlans/en-zhtw-google-translate dataset.

Intended Uses & Limitations

Intended Use Cases

Translating individual sentences from Traditional Chinese to English.
Applications requiring understanding of Taiwanese-style Traditional Chinese.

Limitations

Optimized for single-sentence translation; performance may degrade on longer texts without appropriate segmentation.
May struggle with Taiwanese slang, idioms, and newly emerging expressions.
Can occasionally produce incomprehensible English due to challenges with anaphora and sentence structure differences.
Specificity issues may occur; for example, the Chinese term for "outpost" might be mistranslated as "post office," or "fur trader" as "leather dealer."

繁體中文至英文翻譯模型

該模型將繁體中文翻譯成英文。

它是 Helsinki-NLP/opus-mt-zh-en 模型的微調版本，使用 agentlans/en-zhtw-google-translate 資料集進行訓練。

預期用途與限制

預期用途

將單句繁體中文翻譯成英文
適用於需要理解台灣用語和語氣的應用場景

限制

模型針對單句翻譯進行最佳化；若處理長文但未妥善切句，可能影響翻譯品質
對台灣俚語、成語或新興用語的理解能力有限
偶爾因語序與結構差異，產生不自然或難以理解的英文句子
可能出現語意偏差

How to use / 如何使用

from transformers import pipeline

model_checkpoint = "agentlans/zhtw-en"
translator = pipeline("translation", model=model_checkpoint)

# 摘自中文維基百科的今日文章
# From Chinese Wikipedia's article of the day
translator("《阿奇大戰鐵血戰士》是2015年4至7月黑馬漫畫和阿奇漫畫在美國發行的四期限量連環漫畫圖書，由亞歷克斯·德坎皮創作，費爾南多·魯伊斯繪圖，屬跨公司跨界作品。")[0]['translation_text']

# 輸出
# Output
# The Iron Blood Warriors of the Achilles War is a four-term, serial comic book on black horse comics and Achilles comics released in the United States from April to July 2015. It was created by Alex De Campi and illustrated by Fernando Ruiz. It is a cross-corporate cross-border work.

# 與我自己的黃金標準翻譯比較:
# Compare with my own gold standard translation:
# "Archie vs. Predator" is a limited four-issue comic book series published by Black Horse and Archie Comics in the United States from April to July 2015. It was created by Alex de Campi and drawn by Fernando Ruiz. It's a crossover work.

Training procedure / 訓練過程

Click here / 點這裡

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.3772	1.0	99952	1.2156	54090088
1.2001	2.0	199904	1.1147	108157960
1.0933	3.0	299856	1.0592	162248288
0.9897	4.0	399808	1.0107	216341560
0.9016	5.0	499760	0.9878	270444104

Framework versions

Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 3.2.0
Tokenizers 0.21.0

agentlans
/

zhtw-en