✨ ModernBERT Large for NER

This repository hosts an ModernBERT Large model that was fine-tuned on the CoNLL-2003 NER dataset with the awesome Flair libary.

Please notice the following caveats:

  • ⚠️ To workaround a tokenizer problem in ModernBERT, this model was fine-tuned on a forked and modified ModernBERT Large model.
  • ⚠️ At the moment, don't expect "uber" BERT-like performance, more experiments are needed. (Is RoPE causing this?)

πŸ“ Implementation

The model was trained using my ModernBERT experiments repo.

πŸ“Š Performance

A very basic hyper-parameter search is performanced for five different seeds, with reported averaged micro F1-Score on the development set of CoNLL-2003:

Configuration Subword Pooling Run 1 Run 2 Run 3 Run 4 Run 5 Avg.
bs16-e10-cs0-lr2e-05 first 96.13 96.44 96.20 95.93 96.65 96.27 Β± 0.25
bs16-e10-cs0-lr2e-05 first_last 96.36 96.58 96.14 96.19 96.35 96.32 Β± 0.15

The performance of the current uploaded model is marked in bold.

πŸ“£ Usage

The following code can be used to test the model and recognize named entities for a given sentence:

from flair.data import Sentence
from flair.models import SequenceTagger

# Load the model
tagger = SequenceTagger.load("stefan-it/flair-modernbert-large-ner-conll03")

# Define an example sentence
sentence = Sentence("George Washington went to Washington very fast.")

# Now let's predict named entities...
tagger.predict(sentence)

# Print-out the recognized named entities
print("The following named entities are found:")
for entity in sentence.get_spans('ner'):
    print(entity)

This outputs:

Span[0:2]: "George Washington" β†’ PER (1.0000)
Span[4:5]: "Washington" β†’ LOC (1.0000)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for stefan-it/flair-modernbert-large-ner-conll03

Finetuned
(1)
this model

Dataset used to train stefan-it/flair-modernbert-large-ner-conll03