bert-based-chinese / README.md
Frederick Lee
Update README.md
0e7f293
|
raw
history blame
1.49 kB
metadata
datasets:
  - botp/yentinglin-zh_TW_c4
language:
  - zh
pipeline_tag: fill-mask
Dataset\BERT Pretrain bert-based-chinese ckiplab GufoLab
5000 Tradition Chinese Dataset 0.7183 0.6989 0.8081
10000 Sol-Idea Dataset 0.7874 0.7913 0.8025
ALL DataSet 0.7694 0.7678 0.8038

Model Sources

Uses

Direct Use

This model can be used for masked language modeling

Risks, Limitations and Biases

CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)).

Training

Training Procedure

  • type_vocab_size: 2
  • vocab_size: 21128
  • num_hidden_layers: 12

Training Data

botp/yentinglin-zh_TW_c4

Evaluation

Results

[More Information Needed]

How to Get Started With the Model

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained('Azion/bert-based-chinese', use_auth_token=True)

model = AutoModelForMaskedLM.from_pretrained("Azion/bert-based-chinese", use_auth_token=True)