README.md · Azion/bert-based-chinese at 0e7f2932fc876ce537a1bc0077e1ee8109d57855

metadata

datasets:
  - botp/yentinglin-zh_TW_c4
language:
  - zh
pipeline_tag: fill-mask

Dataset\BERT Pretrain	bert-based-chinese	ckiplab	GufoLab
5000 Tradition Chinese Dataset	0.7183	0.6989	0.8081
10000 Sol-Idea Dataset	0.7874	0.7913	0.8025
ALL DataSet	0.7694	0.7678	0.8038

Model Sources

Paper: BERT

Uses

Direct Use

This model can be used for masked language modeling

Risks, Limitations and Biases

CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)).

Training

Training Procedure

type_vocab_size: 2
vocab_size: 21128
num_hidden_layers: 12

Training Data

botp/yentinglin-zh_TW_c4

Evaluation

Results

[More Information Needed]

How to Get Started With the Model

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained('Azion/bert-based-chinese', use_auth_token=True)

model = AutoModelForMaskedLM.from_pretrained("Azion/bert-based-chinese", use_auth_token=True)