metadata
datasets:
- botp/yentinglin-zh_TW_c4
language:
- zh
pipeline_tag: fill-mask
Dataset\BERT Pretrain | bert-based-chinese | ckiplab | GufoLab |
---|---|---|---|
5000 Tradition Chinese Dataset | 0.7183 | 0.6989 | 0.8081 |
10000 Sol-Idea Dataset | 0.7874 | 0.7913 | 0.8025 |
ALL DataSet | 0.7694 | 0.7678 | 0.8038 |
Model Sources
- Paper: BERT
Uses
Direct Use
This model can be used for masked language modeling
Risks, Limitations and Biases
CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.
Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)).
Training
Training Procedure
- type_vocab_size: 2
- vocab_size: 21128
- num_hidden_layers: 12
Training Data
botp/yentinglin-zh_TW_c4
Evaluation
Results
[More Information Needed]
How to Get Started With the Model
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained('Azion/bert-based-chinese', use_auth_token=True)
model = AutoModelForMaskedLM.from_pretrained("Azion/bert-based-chinese", use_auth_token=True)