Update README.md
Browse files
README.md
CHANGED
|
@@ -6,12 +6,6 @@ language:
|
|
| 6 |
pipeline_tag: fill-mask
|
| 7 |
---
|
| 8 |
|
| 9 |
-
| Dataset\BERT Pretrain | bert-based-chinese | ckiplab | GufoLab |
|
| 10 |
-
| ------------- |:-------------:|:-------------:|:-------------:|
|
| 11 |
-
| 5000 Tradition Chinese Dataset |0.7183| 0.6989| **0.8081**|
|
| 12 |
-
| 10000 Sol-Idea Dataset | 0.7874| 0.7913| **0.8025**|
|
| 13 |
-
| ALL DataSet | 0.7694| 0.7678| **0.8038**|
|
| 14 |
-
|
| 15 |
### Model Sources
|
| 16 |
- **Paper:** [BERT](https://arxiv.org/abs/1810.04805)
|
| 17 |
|
|
@@ -22,13 +16,6 @@ pipeline_tag: fill-mask
|
|
| 22 |
This model can be used for masked language modeling
|
| 23 |
|
| 24 |
|
| 25 |
-
|
| 26 |
-
## Risks, Limitations and Biases
|
| 27 |
-
**CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**
|
| 28 |
-
|
| 29 |
-
Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
|
| 30 |
-
|
| 31 |
-
|
| 32 |
## Training
|
| 33 |
|
| 34 |
#### Training Procedure
|
|
@@ -41,12 +28,41 @@ botp/yentinglin-zh_TW_c4
|
|
| 41 |
|
| 42 |
## Evaluation
|
| 43 |
|
| 44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
-
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
## How to Get Started With the Model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
```python
|
| 51 |
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
| 52 |
|
|
|
|
| 6 |
pipeline_tag: fill-mask
|
| 7 |
---
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
### Model Sources
|
| 10 |
- **Paper:** [BERT](https://arxiv.org/abs/1810.04805)
|
| 11 |
|
|
|
|
| 16 |
This model can be used for masked language modeling
|
| 17 |
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
## Training
|
| 20 |
|
| 21 |
#### Training Procedure
|
|
|
|
| 28 |
|
| 29 |
## Evaluation
|
| 30 |
|
| 31 |
+
| Dataset\BERT Pretrain | bert-based-chinese | ckiplab | GufoLab |
|
| 32 |
+
| ------------- |:-------------:|:-------------:|:-------------:|
|
| 33 |
+
| 5000 Tradition Chinese Dataset |0.7183| 0.6989| **0.8081**|
|
| 34 |
+
| 10000 Sol-Idea Dataset | 0.7874| 0.7913| **0.8025**|
|
| 35 |
+
| ALL DataSet | 0.7694| 0.7678| **0.8038**|
|
| 36 |
|
| 37 |
+
#### Results
|
| 38 |
|
| 39 |
+
| Test ID\Results | [MASK] Input | Result Output |
|
| 40 |
+
| -------------|-------------|-------------|
|
| 41 |
+
| 1|今天禮拜[MASK]?我[MASK]是很想[MASK]班。|今天禮拜六?我不是很想上班。 |
|
| 42 |
+
| 2|[MASK]灣並[MASK]是[MASK]國不可分割的一部分。|臺灣並不是中國不可分割的一部分。 |
|
| 43 |
+
| 3|如果可以是韋[MASK]安的最新歌[MASK]。|如果可以是韋禮安的最新歌曲。 |
|
| 44 |
+
| 4|[MASK]水老[MASK]有賣很多鐵蛋的攤販。|淡水老街有賣很多鐵蛋的攤販。 |
|
| 45 |
|
| 46 |
## How to Get Started With the Model
|
| 47 |
+
#### Private Model Download
|
| 48 |
+
|
| 49 |
+
**Installation**
|
| 50 |
+
```
|
| 51 |
+
$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
|
| 52 |
+
$ sudo apt-get install git-lfs
|
| 53 |
+
$ git lfs install
|
| 54 |
+
$ pip install huggingface_hub
|
| 55 |
+
|
| 56 |
+
```
|
| 57 |
+
**Login HuggingFace**
|
| 58 |
+
|
| 59 |
+
```
|
| 60 |
+
$ huggingface-cli login
|
| 61 |
+
Token:Your own 'write' token.
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
**Pyhon Code**
|
| 65 |
+
|
| 66 |
```python
|
| 67 |
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
| 68 |
|