Model Sources

Uses

Direct Use

This model can be used for masked language modeling

Training

Training Procedure

  • type_vocab_size: 2
  • vocab_size: 21128
  • num_hidden_layers: 12

Training Data

botp/yentinglin-zh_TW_c4

Evaluation

Dataset\BERT Pretrain bert-based-chinese ckiplab GufoLab
5000 Tradition Chinese Dataset 0.7183 0.6989 0.8081
10000 Sol-Idea Dataset 0.7874 0.7913 0.8025
ALL DataSet 0.7694 0.7678 0.8038

Results

Test ID\Results [MASK] Input Result Output
1 今天禮拜[MASK]?我[MASK]是很想[MASK]班。 今天禮拜六?我不是很想上班。
2 [MASK]灣並[MASK]是[MASK]國不可分割的一部分。 臺灣並不是中國不可分割的一部分。
3 如果可以是韋[MASK]安的最新歌[MASK]。 如果可以是韋禮安的最新歌曲。
4 [MASK]水老[MASK]有賣很多鐵蛋的攤販。 淡水老街有賣很多鐵蛋的攤販。

git-lfs Installation

$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
$ sudo apt-get install git-lfs
$ git lfs install
$ pip install huggingface_hub

How to Get Started With the Model

Login HuggingFace on Terminal

$ huggingface-cli login
Token:Your own huggingface token.

Login HuggingFace on Jupyter Notebook

from huggingface_hub import notebook_login

notebook_login()
Token:Your own huggingface token.

Pyhon Code

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained('Azion/bert-based-chinese', use_auth_token=True)

model = AutoModelForMaskedLM.from_pretrained("Azion/bert-based-chinese", use_auth_token=True)
Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Azion/bert-based-chinese