β¨ ModernBERT Large for NER
This repository hosts an ModernBERT Large model that was fine-tuned on the CoNLL-2003 NER dataset with the awesome Flair libary.
Please notice the following caveats:
- β οΈ To workaround a tokenizer problem in ModernBERT, this model was fine-tuned on a forked and modified ModernBERT Large model.
- β οΈ At the moment, don't expect "uber" BERT-like performance, more experiments are needed. (Is RoPE causing this?)
π Implementation
The model was trained using my ModernBERT experiments repo.
π Performance
A very basic hyper-parameter search is performanced for five different seeds, with reported averaged micro F1-Score on the development set of CoNLL-2003:
Configuration | Subword Pooling | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Avg. |
---|---|---|---|---|---|---|---|
bs16-e10-cs0-lr2e-05 |
first |
96.13 | 96.44 | 96.20 | 95.93 | 96.65 | 96.27 Β± 0.25 |
bs16-e10-cs0-lr2e-05 |
first_last |
96.36 | 96.58 | 96.14 | 96.19 | 96.35 | 96.32 Β± 0.15 |
The performance of the current uploaded model is marked in bold.
π£ Usage
The following code can be used to test the model and recognize named entities for a given sentence:
from flair.data import Sentence
from flair.models import SequenceTagger
# Load the model
tagger = SequenceTagger.load("stefan-it/flair-modernbert-large-ner-conll03")
# Define an example sentence
sentence = Sentence("George Washington went to Washington very fast.")
# Now let's predict named entities...
tagger.predict(sentence)
# Print-out the recognized named entities
print("The following named entities are found:")
for entity in sentence.get_spans('ner'):
print(entity)
This outputs:
Span[0:2]: "George Washington" β PER (1.0000)
Span[4:5]: "Washington" β LOC (1.0000)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for stefan-it/flair-modernbert-large-ner-conll03
Base model
stefan-it/ModernBERT-large-tokenizer-fix