Frederick Lee commited on
Commit
0465aa6
·
1 Parent(s): 210bb79

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - botp/yentinglin-zh_TW_c4
4
+ language:
5
+ - zh
6
+ pipeline_tag: fill-mask
7
+ ---
8
+
9
+ | Dataset\BERT Pretrain | bert-based-chinese | ckiplab | GufoLab |
10
+ | ------------- |:-------------:|:-------------:|:-------------:|
11
+ | 5000 Tradition Chinese Dataset |0.7183| 0.6989| **0.8081**|
12
+ | 10000 Sol-Idea Dataset | 0.7874| 0.7913| **0.8025**|
13
+ | ALL DataSet | 0.7694| 0.7678| **0.8038**|
14
+
15
+ ### Model Sources
16
+ - **Paper:** [BERT](https://arxiv.org/abs/1810.04805)
17
+
18
+ ## Uses
19
+
20
+ #### Direct Use
21
+
22
+ This model can be used for masked language modeling
23
+
24
+
25
+
26
+ ## Risks, Limitations and Biases
27
+ **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**
28
+
29
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
30
+
31
+
32
+ ## Training
33
+
34
+ #### Training Procedure
35
+ * **type_vocab_size:** 2
36
+ * **vocab_size:** 21128
37
+ * **num_hidden_layers:** 12
38
+
39
+ #### Training Data
40
+ botp/yentinglin-zh_TW_c4
41
+
42
+ ## Evaluation
43
+
44
+ #### Results
45
+
46
+ [More Information Needed]
47
+
48
+
49
+ ## How to Get Started With the Model
50
+ ```python
51
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
52
+
53
+ tokenizer = AutoTokenizer.from_pretrained('EZlee/bert-based-chinese', use_auth_token=True)
54
+
55
+ model = AutoModelForMaskedLM.from_pretrained("EZlee/bert-based-chinese", use_auth_token=True)
56
+
57
+ ```