Update README.md
Browse files
README.md
CHANGED
@@ -6,12 +6,6 @@ language:
|
|
6 |
pipeline_tag: fill-mask
|
7 |
---
|
8 |
|
9 |
-
| Dataset\BERT Pretrain | bert-based-chinese | ckiplab | GufoLab |
|
10 |
-
| ------------- |:-------------:|:-------------:|:-------------:|
|
11 |
-
| 5000 Tradition Chinese Dataset |0.7183| 0.6989| **0.8081**|
|
12 |
-
| 10000 Sol-Idea Dataset | 0.7874| 0.7913| **0.8025**|
|
13 |
-
| ALL DataSet | 0.7694| 0.7678| **0.8038**|
|
14 |
-
|
15 |
### Model Sources
|
16 |
- **Paper:** [BERT](https://arxiv.org/abs/1810.04805)
|
17 |
|
@@ -22,13 +16,6 @@ pipeline_tag: fill-mask
|
|
22 |
This model can be used for masked language modeling
|
23 |
|
24 |
|
25 |
-
|
26 |
-
## Risks, Limitations and Biases
|
27 |
-
**CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**
|
28 |
-
|
29 |
-
Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
|
30 |
-
|
31 |
-
|
32 |
## Training
|
33 |
|
34 |
#### Training Procedure
|
@@ -41,12 +28,41 @@ botp/yentinglin-zh_TW_c4
|
|
41 |
|
42 |
## Evaluation
|
43 |
|
44 |
-
|
|
|
|
|
|
|
|
|
45 |
|
46 |
-
|
47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
|
49 |
## How to Get Started With the Model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
```python
|
51 |
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
52 |
|
|
|
6 |
pipeline_tag: fill-mask
|
7 |
---
|
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
### Model Sources
|
10 |
- **Paper:** [BERT](https://arxiv.org/abs/1810.04805)
|
11 |
|
|
|
16 |
This model can be used for masked language modeling
|
17 |
|
18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
## Training
|
20 |
|
21 |
#### Training Procedure
|
|
|
28 |
|
29 |
## Evaluation
|
30 |
|
31 |
+
| Dataset\BERT Pretrain | bert-based-chinese | ckiplab | GufoLab |
|
32 |
+
| ------------- |:-------------:|:-------------:|:-------------:|
|
33 |
+
| 5000 Tradition Chinese Dataset |0.7183| 0.6989| **0.8081**|
|
34 |
+
| 10000 Sol-Idea Dataset | 0.7874| 0.7913| **0.8025**|
|
35 |
+
| ALL DataSet | 0.7694| 0.7678| **0.8038**|
|
36 |
|
37 |
+
#### Results
|
38 |
|
39 |
+
| Test ID\Results | [MASK] Input | Result Output |
|
40 |
+
| -------------|-------------|-------------|
|
41 |
+
| 1|今天禮拜[MASK]?我[MASK]是很想[MASK]班。|今天禮拜六?我不是很想上班。 |
|
42 |
+
| 2|[MASK]灣並[MASK]是[MASK]國不可分割的一部分。|臺灣並不是中國不可分割的一部分。 |
|
43 |
+
| 3|如果可以是韋[MASK]安的最新歌[MASK]。|如果可以是韋禮安的最新歌曲。 |
|
44 |
+
| 4|[MASK]水老[MASK]有賣很多鐵蛋的攤販。|淡水老街有賣很多鐵蛋的攤販。 |
|
45 |
|
46 |
## How to Get Started With the Model
|
47 |
+
#### Private Model Download
|
48 |
+
|
49 |
+
**Installation**
|
50 |
+
```
|
51 |
+
$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
|
52 |
+
$ sudo apt-get install git-lfs
|
53 |
+
$ git lfs install
|
54 |
+
$ pip install huggingface_hub
|
55 |
+
|
56 |
+
```
|
57 |
+
**Login HuggingFace**
|
58 |
+
|
59 |
+
```
|
60 |
+
$ huggingface-cli login
|
61 |
+
Token:Your own 'write' token.
|
62 |
+
```
|
63 |
+
|
64 |
+
**Pyhon Code**
|
65 |
+
|
66 |
```python
|
67 |
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
68 |
|