English-to-Traditional Chinese Translator
This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-zh, trained on the agentlans/en-zhtw dataset.
It is optimized to produce Traditional Chinese translations by default, enhancing the naturalness and fluency of the output.
Model Description
- Input: English text only
- Output: Traditional Chinese translation
英文至繁體中文翻譯模型
本模型為 Helsinki-NLP/opus-mt-en-zh 的微調版本,使用 agentlans/en-zhtw 資料集進行訓練。
模型已針對輸出繁體中文進行最佳化,提升了翻譯結果的自然度與流暢性。
模型說明
- 輸入: 僅支援英文文本
- 輸出: 繁體中文翻譯
How to use / 如何使用
from transformers import pipeline
model_checkpoint = "agentlans/en-zhtw"
translator = pipeline("translation", model=model_checkpoint)
translator(
[
"Even if you spend a day in Windsor you'll notice that it's a very multicultural city, yet still retaining a small town feel.",
"Its main waterfront park stretches about 5 km (3.1 mi), from the 1929 Ambassador suspension bridge past the contemporary Windsor Sculpture Park.",
]
)
# [{'translation_text': '即使到風車旅遊一天,也會發現風車是個非常多元化的城市,但還是保留了一個小鎮的感覺。'},
# {'translation_text': '從一九二九年的大使吊橋到現代風景雕塑公園,主要水面公園寬約五公里(三・一米)。'}]
More examples / 更多範例
Click here / 點這裡
English: Now, any physical punishment is prohibited by Judaism - as no proper judicial process could be provided until the Holy Temple is rebuilt by the Messiah.
Chinese: 如今,猶太教禁止任何體罰——除非聖殿由救世主重建,否則任何正當的司法程式都無法適用。
English: Usually, these neglected diseases affect uncared-for populations and vulnerable groups such as indigenous populations, rural inhabitants, the elderly, women living in poverty, and children.
Chinese: 通常,這些被忽視的疾病會影響到群眾和弱勢群體,如土著、農村居民、老年人、生活在貧困中的婦女和兒童。
English: However, translating data from field collection projects into the required spatial data formats and insuring proper data model schemas are used while preserving data integrity can be challenging.
Chinese: 但是,將田野收集專案的資料轉化為必要的空間資料格式,並且在保證資料完整性的同時對恰當的資料模型進行測試是極具挑戰性的。
English: The project's chief investigator, Dr Richard Gordon of UQ's School of Biomedical Sciences, said the research would evaluate if forms of the drug could block brain inflammation associated with the progression of Parkinson's disease.
Chinese: 該專案的總調查員、 UQ 生物醫學院的戈登博士表示,該研究將評估該藥物的種類是否能夠阻止與帕金森病的進化有關的大腦發炎。
English: What separates the charts is what stations or stores each chart uses.
Chinese: 圖表的區別是每張圖表使用的電台或商店。
English: As governments around the world struggle with the crushing economic downturn and increasingly scarce natural resources, leaders at the grassroots level are continuing the critical work that often goes unnoticed, promoting environmental health, civil society, and reform in the face of great hardship.
Chinese: 隨著世界各國政府在經濟衰退的嚴重和自然資源日益稀缺的情況下苦苦掙扎,基層領導人繼續著往往默默無聞的關鍵工作,在艱苦困難面前促進環境健康、公民社會和改革。
English: If the Palestinian government does not want Palestinians to work in the settlements, why hasn't it provided them with alternative jobs or financial compensation?
Chinese: 如果巴勒斯坦政府不想讓巴勒斯坦人在猶太人定居點工作,那它為什麼沒有為他們提供替代工作或經濟補償呢?
English: The maps were usually the large folding kind, not easily collected in a book, even one as big as The Early Resorts of Minnesota.
Chinese: 地圖通常是摺疊型的大片,不易在書中找到,即使是像明尼蘇達州早期避難所那樣大的地圖。
English: A mask and spacer system, called AeroKat®, has been invented to enable cats to use inhalers or puffers.
Chinese: 一種被稱為AeroKatá的面具和空間系統已經被髮明,以便貓可以使用吸入器或噴霧器。
English: Onion salt is a combination of salt and onion powder.
Chinese: 洋蔥鹽是鹽和洋蔥粉混合而成。
Limitations / 限制
Limitations
- Handles only one- or two-sentence inputs in English effectively.
- Struggles with English spelling, names, abbreviations, and especially technical terminology.
- For example, it translates "Windsor" (the city) into "windmill" in Chinese.
- "3.1 mi" (miles) to "3.1 metres" in Chinese.
- Uses unusual punctuation like the English comma instead of the Chinese comma.
- Has difficulty understanding context.
- For example, it interprets "call" in the sentence "Abraham Lincoln called Frederick Douglass to discuss the abolition of slavery" as a telephone call when translating into Chinese.
- As a result, may generate inaccurate information or omit important details.
- Sometimes uses incorrect words due to the base model being primarily trained on Simplified Chinese, which does not always correspond directly to Traditional Chinese.
限制
- 僅適用於處理一至兩句英文句子的輸入,處理較長段落時效果有限。
- 難以準確掌握英語拼字、專有名詞及縮寫,尤其在處理技術術語時表現不佳。
- 例如,將地名「Windsor」誤譯為「風車」。
- 將「3.1 miles」(英哩)錯譯為「3.1 公尺」。
- 常出現標點符號使用不當的情況,例如以英文逗號取代中文逗號。
- 對語境的理解能力有限。
- 例如,將句子「亞伯拉罕·林肯致電弗雷德里克·道格拉斯討論廢奴」中的 "call" 理解為打電話,而非「呼籲」或「拜訪」。
- 可能導致資訊不準確或遺漏重要細節。
- 由於基礎模型主要以簡體中文語料訓練,有時會使用不自然或錯誤的詞語,簡體與繁體用語之間也未必能精確對應。
Training procedure / 訓練過程
Click here / 點這裡
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 10.0
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
2.332 | 1.0 | 41656 | 2.1864 | 22507346 |
2.178 | 2.0 | 83312 | 2.0993 | 45031706 |
2.0336 | 3.0 | 124968 | 2.0502 | 67539658 |
1.9402 | 4.0 | 166624 | 2.0230 | 90066441 |
1.8757 | 5.0 | 208280 | 2.0003 | 112577027 |
1.8059 | 6.0 | 249936 | 1.9898 | 135081404 |
1.7161 | 7.0 | 291592 | 1.9742 | 157627139 |
1.6711 | 8.0 | 333248 | 1.9691 | 180145393 |
1.6226 | 9.0 | 374904 | 1.9653 | 202654353 |
1.5486 | 10.0 | 416560 | 1.9615 | 225160303 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 64
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for agentlans/en-zhtw
Base model
Helsinki-NLP/opus-mt-en-zh