BERTurk-128k | HisTR NER v2

A BERTurk-128k encoder fine-tuned for Named-Entity Recognition on the HisTR corpus (person and location entities in historical Turkish).
Training was performed with the Hugging Face run_ner.py example (Guide Link).


BERTurk-128k fine-tuned on HisTR

Strict F1 (dev): 0.873 (default seqeval metrics used) Strict F1 (Ruznamçe test) (nervaluate used): to be filled
...

Parameter Value
Base model dbmdz/bert-base-turkish-128k-cased
Task NER
Max sequence length 128
Train batch size 16
Eval batch size 8
Learning rate 3 × 10⁻⁵
Epochs 5 (best at 4)
Optim steps 145
Gradient accumulation 1
Mixed precision Disabled (fp16=False)
Device 1 × A100 (40 GB)

Example use

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_id = "cihanunlu/BERTurk_HisTR_NER_v2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = AutoModelForTokenClassification.from_pretrained(model_id)

ner = pipeline("ner", model=model, tokenizer=tokenizer,
               aggregation_strategy="simple")

sentence = "Mevlânâzâde Ahmed Hulûsî Efendi, 1293 senesinde Haleb vilâyetine memur edilmiştir."
print(ner(sentence))
Downloads last month
11
Safetensors
Model size
184M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cihanunlu/BERTurk_HisTR_NER_v2

Finetuned
(18)
this model

Dataset used to train cihanunlu/BERTurk_HisTR_NER_v2