indiejoseph
/

bert-base-cantonese

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

indiejoseph commited on Nov 22, 2023

Commit

d9b4729

•

1 Parent(s): b130793

Update README.md

Files changed (1) hide show

README.md +10 -2

README.md CHANGED Viewed

@@ -1,9 +1,17 @@
 ---
 tags:
 - generated_from_trainer
 model-index:
 - name: bert-base-cantonese
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -11,11 +19,11 @@ should probably proofread and complete it, then remove this comment. -->
 # bert-base-cantonese
-This model was trained from scratch on an unknown dataset.
 ## Model description
-More information needed
 ## Intended uses & limitations

 ---
+base_model: bert-base-chinese
 tags:
 - generated_from_trainer
 model-index:
 - name: bert-base-cantonese
   results: []
+license: cc-by-4.0
+language:
+  - yue
+pipeline_tag: fill-mask
+widget:
+  - text: 香港原本[MASK]一個人煙稀少嘅漁港。
+    example_title: 係
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # bert-base-cantonese
+This model is a continue pre-train version of bert-base-chinese on Cantonese Common Crawl dataset with 198m tokens.
 ## Model description
+This model has extended 500 more Chinese characters which very common in Cantonese, such as 冧, 噉, 麪, 笪, 冚, 乸 etc.
 ## Intended uses & limitations