BioClinical ModernBERT

BioClinical ModernBERT is available in two sizes: base (150M parameters) and large (396M parameters). The model training checkpoints can be found here, and our code is available in our GitHub repository.

Model Summary
Usage
Training
Evaluation
License
Citation

Model Summary

BioClinical ModernBERT is a domain-adapted encoder that builds on ModernBERT base and large, incorporating long-context processing and substantial improvements in speed and performance for biomedical and clinical NLP. BioClinical ModernBERT is trained on the largest biomedical and clinical corpus to date, with over 53.5 billion tokens, and addresses a key limitation of prior clinical encoders by leveraging 20 datasets from diverse institutions, domains, and geographic regions, rather than relying on data from a single source.

Usage

You can use these models directly with the transformers library starting from v4.48.0:

pip install -U transformers>=4.48.0

Since BioClinical ModernBERT is a Masked Language Model (MLM), you can use the fill-mask pipeline or load it via AutoModelForMaskedLM. To use BioClinical ModernBERT for downstream tasks like classification, retrieval, or QA, fine-tune it following standard BERT fine-tuning recipes.

⚠️ If your GPU supports it, we recommend using BioClinical ModernBERT with Flash Attention 2 to reach the highest efficiency. To do so, install Flash Attention as follows, then use the model as normal:

pip install flash-attn

Using AutoModelForMaskedLM:

from transformers import AutoTokenizer, AutoModelForMaskedLM
model_id = "thomas-sounack/BioClinical-ModernBERT-large"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)
text = "Mitochondria is the powerhouse of the [MASK]."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
# To get predictions for the mask:
masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print("Predicted token:", predicted_token)
# Predicted token:  cell

Using a pipeline:

import torch
from transformers import pipeline
from pprint import pprint
pipe = pipeline(
    "fill-mask",
    model="thomas-sounack/BioClinical-ModernBERT-large",
    torch_dtype=torch.bfloat16,
)
input_text = "[MASK] is a disease caused by an uncontrolled division of abnormal cells in a part of the body."
results = pipe(input_text)
pprint(results)

Note: BioClinical ModernBERT, similarly to ModernBERT, does not use token type IDs unlike some earlier BERT models. Most downstream usage is identical to standard BERT models on the Hugging Face Hub, except you can omit the token_type_ids parameter.

Training

Data

BioClinical ModernBERT is trained on 50.7B tokens of biomedical text gathered from PubMed and PMC, and 2.8B tokens of clinical text from 20 datasets which are detailed in the table below.

Name	Country	Clinical Source	Clinical Context	Samples	Tokens (M)
ACI-BENCH	US	Clinical Notes	Not Reported	207	0.1
ADE Corpus	Several	Clinical Notes	Not Reported	20,896	0.5
Brain MRI Stroke	Korea	Radiology Reports	Neurology	2,603	0.2
CheXpert Plus	US	Radiology Reports	Pulmonology	223,460	60.6
CHIFIR	Australia	Pathology Reports	Hematology / Oncology	283	0.1
CORAL	US	Progress Notes	Hematology / Oncology	240	0.7
Eye Gaze CXR	US	Radiology Reports	Pulmonology	892	0.03
Gout Chief Complaints	US	Chief Complaint	Internal Medicine	8,429	0.2
ID-68	UK	Clinical Notes	Psychology	78	0.02
Inspect	US	Radiology Reports	Pulmonology	22,259	2.8
MedNLI	US	Clinical Notes	Internal Medicine	14,047	0.5
MedQA	US	National Medical Board Examination	Not Reported	14,366	2.0
MIMIC-III	US	Clinical Notes	Internal Medicine	2,021,411	1,047.7
MIMIC-IV Note	US	Clinical Notes	Internal Medicine	2,631,243	1,765.7
MTSamples	Not Reported	Clinical Notes	Internal Medicine	2,358	1.7
Negex	US	Discharge Summaries	Not Reported	2,056	0.1
PriMock57	UK	Simulated Patient Care	Internal Medicine	57	0.01
Q-Pain	US	Clinical Vignettes	Palliative Care	51	0.01
REFLACX	US	Radiology Reports	Pulmonology	2,543	0.1
Simulated Resp. Interviews	Canada	Simulated Patient Care	Pulmonology	272	0.6

Methodology

BioClinical ModernBERT large is trained in two phases. This model is initialized from the last stable-phase checkpoint of ModernBERT large and trained with the same hyperparameters: learning rate of 5e-5 and batch size of 77.

Phase 1: Training on 160.5B tokens from PubMed, PMC, and the 20 clinical datasets. Learning rate remains constant throughout this stage, and the masking probability is set at 30%.
Phase 2: Training on the 20 clinical datasets only. Masking probability is reduced to 15%. The model is trained for 3 epochs using a hybrid schedule: constant learning rate for the first two epochs, followed by a 1-sqrt decay in the final epoch.

Evaluation

	Model	Context Length	ChemProt	Phenotype	COS	Social History	DEID
Base	BioBERT	512	89.5	26.6	94.9	55.8	74.3
	Clinical BERT	512	88.3	25.8	95.0	55.2	74.2
	BioMed-RoBERTa	512	89.0	36.8	94.9	55.2	81.1
	Clinical-BigBird	4096	87.4	26.5	94.0	53.3	71.2
	Clinical-Longformer	4096	74.2	46.4	95.2	56.8	82.3
	Clinical ModernBERT	8192	86.9	54.9	93.7	53.8	44.4
	ModernBERT - base	8192	89.5	48.4	94.0	53.1	78.3
	BioClinical ModernBERT - base	8192	89.9	58.1	95.1	58.5	82.7
Large	ModernBERT - large	8192	90.2	58.3	94.4	54.8	82.1
	BioClinical ModernBERT - large	8192	90.8	60.8	95.1	57.1	83.8

License

We release the BioClinical ModernBERT base and large model weights and training checkpoints under the MIT license.

Citation

If you use BioClinical ModernBERT in your work, please cite our preprint:

@misc{sounack2025bioclinicalmodernbertstateoftheartlongcontext,
      title={BioClinical ModernBERT: A State-of-the-Art Long-Context Encoder for Biomedical and Clinical NLP}, 
      author={Thomas Sounack and Joshua Davis and Brigitte Durieux and Antoine Chaffin and Tom J. Pollard and Eric Lehman and Alistair E. W. Johnson and Matthew McDermott and Tristan Naumann and Charlotta Lindvall},
      year={2025},
      eprint={2506.10896},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.10896}, 
}

thomas-sounack
/

BioClinical-ModernBERT-large

BioClinical ModernBERT

Table of Contents

Model Summary

Usage

Training

Data

Methodology

Evaluation

License

Citation

Model tree for thomas-sounack/BioClinical-ModernBERT-large

Collection including thomas-sounack/BioClinical-ModernBERT-large

BioClinical ModernBERT