bert-based-chinese / README.md
EZlee's picture
Update README.md
20c3685
metadata
datasets:
  - botp/yentinglin-zh_TW_c4
language:
  - zh
pipeline_tag: fill-mask

Model Sources

Uses

Direct Use

This model can be used for masked language modeling

Training

Training Procedure

  • type_vocab_size: 2
  • vocab_size: 21128
  • num_hidden_layers: 12

Training Data

botp/yentinglin-zh_TW_c4

Evaluation

Dataset\BERT Pretrain bert-based-chinese ckiplab GufoLab
5000 Tradition Chinese Dataset 0.7183 0.6989 0.8081
10000 Sol-Idea Dataset 0.7874 0.7913 0.8025
ALL DataSet 0.7694 0.7678 0.8038

Results

Test ID\Results [MASK] Input Result Output
1 今天禮拜[MASK]?我[MASK]是很想[MASK]班。 今天禮拜六?我不是很想上班。
2 [MASK]灣並[MASK]是[MASK]國不可分割的一部分。 臺灣並不是中國不可分割的一部分。
3 如果可以是韋[MASK]安的最新歌[MASK]。 如果可以是韋禮安的最新歌曲。
4 [MASK]水老[MASK]有賣很多鐵蛋的攤販。 淡水老街有賣很多鐵蛋的攤販。

git-lfs Installation

$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
$ sudo apt-get install git-lfs
$ git lfs install
$ pip install huggingface_hub

How to Get Started With the Model

Login HuggingFace on Terminal

$ huggingface-cli login
Token:Your own huggingface token.

Login HuggingFace on Jupyter Notebook

from huggingface_hub import notebook_login

notebook_login()
Token:Your own huggingface token.

Pyhon Code

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained('Azion/bert-based-chinese', use_auth_token=True)

model = AutoModelForMaskedLM.from_pretrained("Azion/bert-based-chinese", use_auth_token=True)