conan1024hao
/

cjkbert-small

Model card Files Files and versions Community

conan1024hao commited on May 14, 2022

Commit

c2dbdac

·

1 Parent(s): fea7bb4

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ widget:
 ---
 ### Model description
-This model was trained on ZH, JA, KO's Wikipedia (5 epochs).
 ### How to use
 ```python
@@ -22,7 +22,8 @@ from transformers import AutoTokenizer, AutoModelForMaskedLM
 tokenizer = AutoTokenizer.from_pretrained("conan1024hao/cjkbert-small")
 model = AutoModelForMaskedLM.from_pretrained("conan1024hao/cjkbert-small")
 ```
-Before you fine-tune downstream tasks, you don't need any text segmentation. (Though you may obtain better results if you applied morphological analysis to the data before fine-tuning.)
 ### Morphological analysis tools
 - ZH: For Chinese, we use [LTP](https://github.com/HIT-SCIR/ltp).
@@ -30,7 +31,7 @@ Before you fine-tune downstream tasks, you don't need any text segmentation. (Th
 - KO: For Korean, we use [KoNLPy](https://github.com/konlpy/konlpy)(Kkma class).
 ### Tokenization
-We use character-based tokenization with whole-word-masking strategy.
 ### Model size
 - vocab_size: 15015

 ---
 ### Model description
+- This model was trained on **ZH, JA, KO**'s Wikipedia (5 epochs).
 ### How to use
 ```python
 tokenizer = AutoTokenizer.from_pretrained("conan1024hao/cjkbert-small")
 model = AutoModelForMaskedLM.from_pretrained("conan1024hao/cjkbert-small")
 ```
+- Before you fine-tune downstream tasks, you don't need any text segmentation.
+- (Though you may obtain better results if you applied morphological analysis to the data before fine-tuning.)
 ### Morphological analysis tools
 - ZH: For Chinese, we use [LTP](https://github.com/HIT-SCIR/ltp).
 - KO: For Korean, we use [KoNLPy](https://github.com/konlpy/konlpy)(Kkma class).
 ### Tokenization
+- We use character-based tokenization with **whole-word-masking** strategy.
 ### Model size
 - vocab_size: 15015