PaddlePaddle/ernie-3.0-nano-zh

Intro

ERNIE 3.0 Models are lightweight models obtained from Wenxin large model ERNIE 3.0 using distillation technology. The model structure is consistent with ERNIE 2.0, and has a stronger Chinese effect than ERNIE 2.0.

For a detailed explanation of related technologies, please refer to the article 解析全球最大中文单体模型鹏城-百度·文心技术细节

How to Use

Click on the "Use in paddlenlp" on the top right corner!

Performance

ERNIE 3.0 open sources six models: ERNIE 3.0 XBase, ERNIE 3.0 Base, ERNIE 3.0 Medium, ERNIE 3.0 Mini, ERNIE 3.0 Micro, ERNIE 3.0 Nano:

ERNIE 3.0-XBase (20-layer, 1024-hidden, 16-heads)
ERNIE 3.0-Base (12-layer, 768-hidden, 12-heads)
ERNIE 3.0-Medium (6-layer, 768-hidden, 12-heads)
ERNIE 3.0-Mini (6-layer, 384-hidden, 12-heads)
ERNIE 3.0-Micro (4-layer, 384-hidden, 12-heads)
ERNIE 3.0-Nano (4-layer, 312-hidden, 12-heads)

Below is the precision-latency graph of the small Chinese models in PaddleNLP. The abscissa represents the latency (unit: ms) tested on CLUE IFLYTEK dataset (maximum sequence length is set to 128), and the ordinate is the average accuracy on 10 CLUE tasks (including text classification, text matching, natural language inference, Pronoun disambiguation, machine reading comprehension and other tasks), among which the metric of CMRC2018 is Exact Match (EM), and the metric of other tasks is Accuracy. The closer the model to the top left in the figure, the higher the level of accuracy and performance.The top left model in the figure has the highest level of accuracy and performance.

The number of parameters of the model are marked under the model name in the figure. For the test environment, see Performance Test in details.

precision-latency graph under CPU (number of threads: 1 and 8), batch_size = 32:

precision-latency graph under CPU (number of threads: 1 and 8), batch_size = 1:

precision-latency graph under GPU, batch_size = 32, 1:

As can be seen from the figure, the comprehensive performance of the ERNIE Tiny 3.0 models has been comprehensively ahead of UER-py, Huawei-Noah and HFL in terms of accuracy and performance. And when batch_size=1 and the precision mode is FP16, the inference performance of the wide and shallow model on the GPU is more advantageous.

The precision data on the CLUE validation set are shown in the following table:

Arch	Model	AVG	AFQMC	TNEWS	IFLYTEK	CMNLI	OCNLI	CLUEWSC2020	CSL	CMRC2018	CHID	C³
24L1024H	ERNIE 1.0-Large-cw	79.03	75.97	59.65	62.91	85.09	81.73	93.09	84.53	74.22/91.88	88.57	84.54
	ERNIE 2.0-Large-zh	76.90	76.23	59.33	61.91	83.85	79.93	89.82	83.23	70.95/90.31	86.78	78.12
	RoBERTa-wwm-ext-large	76.61	76.00	59.33	62.02	83.88	78.81	90.79	83.67	70.58/89.82	85.72	75.26
20L1024H	ERNIE 3.0-Xbase-zh	78.39	76.16	59.55	61.87	84.40	81.73	88.82	83.60	75.99/93.00	86.78	84.98
12L768H	ERNIE 3.0-Base-zh	76.05	75.93	58.26	61.56	83.02	80.10	86.18	82.63	70.71/90.41	84.26	77.88
	ERNIE 1.0-Base-zh-cw	76.47	76.07	57.86	59.91	83.41	79.58	89.91	83.42	72.88/90.78	84.68	76.98
	ERNIE-Gram-zh	75.72	75.28	57.88	60.87	82.90	79.08	88.82	82.83	71.82/90.38	84.04	73.69
	Langboat/Mengzi-BERT-Base	74.69	75.35	57.76	61.64	82.41	77.93	88.16	82.20	67.04/88.35	83.74	70.70
	ERNIE 2.0-Base-zh	74.32	75.65	58.25	61.64	82.62	78.71	81.91	82.33	66.08/87.46	82.78	73.19
	ERNIE 1.0-Base-zh	74.17	74.84	58.91	62.25	81.68	76.58	85.20	82.77	67.32/87.83	82.47	69.68
	RoBERTa-wwm-ext	74.11	74.60	58.08	61.23	81.11	76.92	88.49	80.77	68.39/88.50	83.43	68.03
	BERT-Base-Chinese	72.57	74.63	57.13	61.29	80.97	75.22	81.91	81.90	65.30/86.53	82.01	65.38
	UER/Chinese-RoBERTa-Base	71.78	72.89	57.62	61.14	80.01	75.56	81.58	80.80	63.87/84.95	81.52	62.76
8L512H	UER/Chinese-RoBERTa-Medium	67.06	70.64	56.10	58.29	77.35	71.90	68.09	78.63	57.63/78.91	75.13	56.84
6L768H	ERNIE 3.0-Medium-zh	72.49	73.37	57.00	60.67	80.64	76.88	79.28	81.60	65.83/87.30	79.91	69.73
	HLF/RBT6, Chinese	70.06	73.45	56.82	59.64	79.36	73.32	76.64	80.67	62.72/84.77	78.17	59.85
	TinyBERT₆, Chinese	69.62	72.22	55.70	54.48	79.12	74.07	77.63	80.17	63.03/83.75	77.64	62.11
	RoFormerV2 Small	68.52	72.47	56.53	60.72	76.37	72.95	75.00	81.07	62.97/83.64	67.66	59.41
	UER/Chinese-RoBERTa-L6-H768	67.09	70.13	56.54	60.48	77.49	72.00	72.04	77.33	53.74/75.52	76.73	54.40
6L384H	ERNIE 3.0-Mini-zh	66.90	71.85	55.24	54.48	77.19	73.08	71.05	79.30	58.53/81.97	69.71	58.60
4L768H	HFL/RBT4, Chinese	67.42	72.41	56.50	58.95	77.34	70.78	71.05	78.23	59.30/81.93	73.18	56.45
4L512H	UER/Chinese-RoBERTa-Small	63.25	69.21	55.41	57.552	73.64	69.80	66.78	74.83	46.75/69.69	67.59	50.92
4L384H	ERNIE 3.0-Micro-zh	64.21	71.15	55.05	53.83	74.81	70.41	69.08	76.50	53.77/77.82	62.26	55.53
4L312H	ERNIE 3.0-Nano-zh	62.97	70.51	54.57	48.36	74.97	70.61	68.75	75.93	52.00/76.35	58.91	55.11
4L312H	TinyBERT₄, Chinese	60.82	69.07	54.02	39.71	73.94	69.59	70.07	75.07	46.04/69.34	58.53	52.18
4L256H	UER/Chinese-RoBERTa-Mini	53.40	69.32	54.22	41.63	69.40	67.36	65.13	70.07	5.96/17.13	51.19	39.68
3L1024H	HFL/RBTL3, Chinese	66.63	71.11	56.14	59.56	76.41	71.29	69.74	76.93	58.50/80.90	71.03	55.56
3L768H	HFL/RBT3, Chinese	65.72	70.95	55.53	59.18	76.20	70.71	67.11	76.63	55.73/78.63	70.26	54.93
2L128H	UER/Chinese-RoBERTa-Tiny	44.45	69.02	51.47	20.28	59.95	57.73	63.82	67.43	3.08/14.33	23.57	28.12

Citation Info

@article{sun2021ernie,
  title={Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation},
  author={Sun, Yu and Wang, Shuohuan and Feng, Shikun and Ding, Siyu and Pang, Chao and Shang, Junyuan and Liu, Jiaxiang and Chen, Xuyi and Zhao, Yanbin and Lu, Yuxiang and others},
  journal={arXiv preprint arXiv:2107.02137},
  year={2021}
}

@article{su2021ernie,
  title={Ernie-tiny: A progressive distillation framework for pretrained transformer compression},
  author={Su, Weiyue and Chen, Xuyi and Feng, Shikun and Liu, Jiaxiang and Liu, Weixin and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
  journal={arXiv preprint arXiv:2106.02241},
  year={2021}
}

@article{wang2021ernie,
  title={Ernie 3.0 titan: Exploring larger-scale knowledge enhanced pre-training for language understanding and generation},
  author={Wang, Shuohuan and Sun, Yu and Xiang, Yang and Wu, Zhihua and Ding, Siyu and Gong, Weibao and Feng, Shikun and Shang, Junyuan and Zhao, Yanbin and Pang, Chao and others},
  journal={arXiv preprint arXiv:2112.12731},
  year={2021}
}