arxiv:2302.01588

Bioformer: an efficient transformer language model for biomedical text mining

Published on Feb 3, 2023

Authors:

Qingyu Chen ,

Abstract

Pretrained language models such as Bidirectional Encoder Representations from Transformers (BERT) have achieved state-of-the-art performance in natural language processing (NLP) tasks. Recently, BERT has been adapted to the biomedical domain. Despite the effectiveness, these models have hundreds of millions of parameters and are computationally expensive when applied to large-scale NLP applications. We hypothesized that the number of parameters of the original BERT can be dramatically reduced with minor impact on performance. In this study, we present Bioformer, a compact BERT model for biomedical text mining. We pretrained two Bioformer models (named <PRE_TAG>Bioformer8L</POST_TAG> and <PRE_TAG>Bioformer16L</POST_TAG>) which reduced the model size by 60% compared to BERTBase. Bioformer uses a biomedical vocabulary and was pre-trained from scratch on PubMed abstracts and PubMed Central full-text articles. We thoroughly evaluated the performance of Bioformer as well as existing biomedical BERT models including BioBERT and PubMedBERT on 15 benchmark datasets of four different biomedical NLP tasks: named entity recognition, relation extraction, question answering and document classification. The results show that with 60% fewer parameters, <PRE_TAG>Bioformer16L</POST_TAG> is only 0.1% less accurate than PubMedBERT while <PRE_TAG>Bioformer8L</POST_TAG> is 0.9% less accurate than PubMedBERT. Both <PRE_TAG>Bioformer16L</POST_TAG> and <PRE_TAG>Bioformer8L</POST_TAG> outperformed BioBERTBase-v1.1. In addition, <PRE_TAG>Bioformer16L</POST_TAG> and <PRE_TAG>Bioformer8L</POST_TAG> are 2-3 fold as fast as PubMedBERT/BioBERTBase-v1.1. Bioformer has been successfully deployed to PubTator Central providing gene annotations over 35 million PubMed abstracts and 5 million PubMed Central full-text articles. We make Bioformer publicly available via https://github.com/WGLab/bioformer, including pre-trained models, datasets, and instructions for downstream use.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2302.01588 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.