PaddlePaddle/ernie-3.0-nano-zh
Intro
ERNIE 3.0 Models are lightweight models obtained from Wenxin large model ERNIE 3.0 using distillation technology. The model structure is consistent with ERNIE 2.0, and has a stronger Chinese effect than ERNIE 2.0.
For a detailed explanation of related technologies, please refer to the article 解析全球最大中文单体模型鹏城-百度·文心技术细节
How to Use
Click on the "Use in paddlenlp" on the top right corner!
Performance
ERNIE 3.0 open sources six models: ERNIE 3.0 XBase, ERNIE 3.0 Base, ERNIE 3.0 Medium, ERNIE 3.0 Mini, ERNIE 3.0 Micro, ERNIE 3.0 Nano:
- ERNIE 3.0-XBase (20-layer, 1024-hidden, 16-heads)
- ERNIE 3.0-Base (12-layer, 768-hidden, 12-heads)
- ERNIE 3.0-Medium (6-layer, 768-hidden, 12-heads)
- ERNIE 3.0-Mini (6-layer, 384-hidden, 12-heads)
- ERNIE 3.0-Micro (4-layer, 384-hidden, 12-heads)
- ERNIE 3.0-Nano (4-layer, 312-hidden, 12-heads)
Below is the precision-latency graph of the small Chinese models in PaddleNLP. The abscissa represents the latency (unit: ms) tested on CLUE IFLYTEK dataset (maximum sequence length is set to 128), and the ordinate is the average accuracy on 10 CLUE tasks (including text classification, text matching, natural language inference, Pronoun disambiguation, machine reading comprehension and other tasks), among which the metric of CMRC2018 is Exact Match (EM), and the metric of other tasks is Accuracy. The closer the model to the top left in the figure, the higher the level of accuracy and performance.The top left model in the figure has the highest level of accuracy and performance.
The number of parameters of the model are marked under the model name in the figure. For the test environment, see Performance Test in details.
precision-latency graph under CPU (number of threads: 1 and 8), batch_size = 32:
precision-latency graph under CPU (number of threads: 1 and 8), batch_size = 1:
precision-latency graph under GPU, batch_size = 32, 1:
As can be seen from the figure, the comprehensive performance of the ERNIE Tiny 3.0 models has been comprehensively ahead of UER-py, Huawei-Noah and HFL in terms of accuracy and performance. And when batch_size=1 and the precision mode is FP16, the inference performance of the wide and shallow model on the GPU is more advantageous.
The precision data on the CLUE validation set are shown in the following table:
Arch | Model | AVG | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | CLUEWSC2020 | CSL | CMRC2018 | CHID | C3 |
24L1024H | ERNIE 1.0-Large-cw | 79.03 | 75.97 | 59.65 | 62.91 | 85.09 | 81.73 | 93.09 | 84.53 | 74.22/91.88 | 88.57 | 84.54 |
ERNIE 2.0-Large-zh | 76.90 | 76.23 | 59.33 | 61.91 | 83.85 | 79.93 | 89.82 | 83.23 | 70.95/90.31 | 86.78 | 78.12 | |
RoBERTa-wwm-ext-large | 76.61 | 76.00 | 59.33 | 62.02 | 83.88 | 78.81 | 90.79 | 83.67 | 70.58/89.82 | 85.72 | 75.26 | |
20L1024H | ERNIE 3.0-Xbase-zh | 78.39 | 76.16 | 59.55 | 61.87 | 84.40 | 81.73 | 88.82 | 83.60 | 75.99/93.00 | 86.78 | 84.98 |
12L768H | ERNIE 3.0-Base-zh | 76.05 | 75.93 | 58.26 | 61.56 | 83.02 | 80.10 | 86.18 | 82.63 | 70.71/90.41 | 84.26 | 77.88 |
ERNIE 1.0-Base-zh-cw | 76.47 | 76.07 | 57.86 | 59.91 | 83.41 | 79.58 | 89.91 | 83.42 | 72.88/90.78 | 84.68 | 76.98 | |
ERNIE-Gram-zh | 75.72 | 75.28 | 57.88 | 60.87 | 82.90 | 79.08 | 88.82 | 82.83 | 71.82/90.38 | 84.04 | 73.69 | |
Langboat/Mengzi-BERT-Base | 74.69 | 75.35 | 57.76 | 61.64 | 82.41 | 77.93 | 88.16 | 82.20 | 67.04/88.35 | 83.74 | 70.70 | |
ERNIE 2.0-Base-zh | 74.32 | 75.65 | 58.25 | 61.64 | 82.62 | 78.71 | 81.91 | 82.33 | 66.08/87.46 | 82.78 | 73.19 | |
ERNIE 1.0-Base-zh | 74.17 | 74.84 | 58.91 | 62.25 | 81.68 | 76.58 | 85.20 | 82.77 | 67.32/87.83 | 82.47 | 69.68 | |
RoBERTa-wwm-ext | 74.11 | 74.60 | 58.08 | 61.23 | 81.11 | 76.92 | 88.49 | 80.77 | 68.39/88.50 | 83.43 | 68.03 | |
BERT-Base-Chinese | 72.57 | 74.63 | 57.13 | 61.29 | 80.97 | 75.22 | 81.91 | 81.90 | 65.30/86.53 | 82.01 | 65.38 | |
UER/Chinese-RoBERTa-Base | 71.78 | 72.89 | 57.62 | 61.14 | 80.01 | 75.56 | 81.58 | 80.80 | 63.87/84.95 | 81.52 | 62.76 | |
8L512H | UER/Chinese-RoBERTa-Medium | 67.06 | 70.64 | 56.10 | 58.29 | 77.35 | 71.90 | 68.09 | 78.63 | 57.63/78.91 | 75.13 | 56.84 |
6L768H | ERNIE 3.0-Medium-zh | 72.49 | 73.37 | 57.00 | 60.67 | 80.64 | 76.88 | 79.28 | 81.60 | 65.83/87.30 | 79.91 | 69.73 |
HLF/RBT6, Chinese | 70.06 | 73.45 | 56.82 | 59.64 | 79.36 | 73.32 | 76.64 | 80.67 | 62.72/84.77 | 78.17 | 59.85 | |
TinyBERT6, Chinese | 69.62 | 72.22 | 55.70 | 54.48 | 79.12 | 74.07 | 77.63 | 80.17 | 63.03/83.75 | 77.64 | 62.11 | |
RoFormerV2 Small | 68.52 | 72.47 | 56.53 | 60.72 | 76.37 | 72.95 | 75.00 | 81.07 | 62.97/83.64 | 67.66 | 59.41 | |
UER/Chinese-RoBERTa-L6-H768 | 67.09 | 70.13 | 56.54 | 60.48 | 77.49 | 72.00 | 72.04 | 77.33 | 53.74/75.52 | 76.73 | 54.40 | |
6L384H | ERNIE 3.0-Mini-zh | 66.90 | 71.85 | 55.24 | 54.48 | 77.19 | 73.08 | 71.05 | 79.30 | 58.53/81.97 | 69.71 | 58.60 |
4L768H | HFL/RBT4, Chinese | 67.42 | 72.41 | 56.50 | 58.95 | 77.34 | 70.78 | 71.05 | 78.23 | 59.30/81.93 | 73.18 | 56.45 |
4L512H | UER/Chinese-RoBERTa-Small | 63.25 | 69.21 | 55.41 | 57.552 | 73.64 | 69.80 | 66.78 | 74.83 | 46.75/69.69 | 67.59 | 50.92 |
4L384H | ERNIE 3.0-Micro-zh | 64.21 | 71.15 | 55.05 | 53.83 | 74.81 | 70.41 | 69.08 | 76.50 | 53.77/77.82 | 62.26 | 55.53 |
4L312H | ERNIE 3.0-Nano-zh | 62.97 | 70.51 | 54.57 | 48.36 | 74.97 | 70.61 | 68.75 | 75.93 | 52.00/76.35 | 58.91 | 55.11 |
TinyBERT4, Chinese | 60.82 | 69.07 | 54.02 | 39.71 | 73.94 | 69.59 | 70.07 | 75.07 | 46.04/69.34 | 58.53 | 52.18 | |
4L256H | UER/Chinese-RoBERTa-Mini | 53.40 | 69.32 | 54.22 | 41.63 | 69.40 | 67.36 | 65.13 | 70.07 | 5.96/17.13 | 51.19 | 39.68 |
3L1024H | HFL/RBTL3, Chinese | 66.63 | 71.11 | 56.14 | 59.56 | 76.41 | 71.29 | 69.74 | 76.93 | 58.50/80.90 | 71.03 | 55.56 |
3L768H | HFL/RBT3, Chinese | 65.72 | 70.95 | 55.53 | 59.18 | 76.20 | 70.71 | 67.11 | 76.63 | 55.73/78.63 | 70.26 | 54.93 |
2L128H | UER/Chinese-RoBERTa-Tiny | 44.45 | 69.02 | 51.47 | 20.28 | 59.95 | 57.73 | 63.82 | 67.43 | 3.08/14.33 | 23.57 | 28.12 |
Citation Info
@article{sun2021ernie,
title={Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation},
author={Sun, Yu and Wang, Shuohuan and Feng, Shikun and Ding, Siyu and Pang, Chao and Shang, Junyuan and Liu, Jiaxiang and Chen, Xuyi and Zhao, Yanbin and Lu, Yuxiang and others},
journal={arXiv preprint arXiv:2107.02137},
year={2021}
}
@article{su2021ernie,
title={Ernie-tiny: A progressive distillation framework for pretrained transformer compression},
author={Su, Weiyue and Chen, Xuyi and Feng, Shikun and Liu, Jiaxiang and Liu, Weixin and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
journal={arXiv preprint arXiv:2106.02241},
year={2021}
}
@article{wang2021ernie,
title={Ernie 3.0 titan: Exploring larger-scale knowledge enhanced pre-training for language understanding and generation},
author={Wang, Shuohuan and Sun, Yu and Xiang, Yang and Wu, Zhihua and Ding, Siyu and Gong, Weibao and Feng, Shikun and Shang, Junyuan and Zhao, Yanbin and Pang, Chao and others},
journal={arXiv preprint arXiv:2112.12731},
year={2021}
}
- Downloads last month
- 4