✨ ModernBERT Large for NER

This repository hosts an ModernBERT Large model that was fine-tuned on the CoNLL-2003 NER dataset with the awesome Flair libary.

Please notice the following caveats:

⚠️ To workaround a tokenizer problem in ModernBERT, this model was fine-tuned on a forked and modified ModernBERT Large model.
⚠️ At the moment, don't expect "uber" BERT-like performance, more experiments are needed. (Is RoPE causing this?)

📝 Implementation

The model was trained using my ModernBERT experiments repo.

📊 Performance

A very basic hyper-parameter search is performanced for five different seeds, with reported averaged micro F1-Score on the development set of CoNLL-2003:

Configuration	Subword Pooling	Run 1	Run 2	Run 3	Run 4	Run 5	Avg.
`bs16-e10-cs0-lr2e-05`	`first`	96.13	96.44	96.20	95.93	96.65	96.27 ± 0.25
`bs16-e10-cs0-lr2e-05`	`first_last`	96.36	96.58	96.14	96.19	96.35	96.32 ± 0.15

The performance of the current uploaded model is marked in bold.

📣 Usage

The following code can be used to test the model and recognize named entities for a given sentence:

from flair.data import Sentence
from flair.models import SequenceTagger

# Load the model
tagger = SequenceTagger.load("stefan-it/flair-modernbert-large-ner-conll03")

# Define an example sentence
sentence = Sentence("George Washington went to Washington very fast.")

# Now let's predict named entities...
tagger.predict(sentence)

# Print-out the recognized named entities
print("The following named entities are found:")
for entity in sentence.get_spans('ner'):
    print(entity)

This outputs:

Span[0:2]: "George Washington" → PER (1.0000)
Span[4:5]: "Washington" → LOC (1.0000)

stefan-it
/

flair-modernbert-large-ner-conll03

✨ ModernBERT Large for NER

📝 Implementation

📊 Performance

📣 Usage

Model tree for stefan-it/flair-modernbert-large-ner-conll03

Dataset used to train stefan-it/flair-modernbert-large-ner-conll03