BERTurk-128k | HisTR NER v2
A BERTurk-128k encoder fine-tuned for Named-Entity Recognition on the HisTR corpus (person and location entities in historical Turkish).
Training was performed with the Hugging Face run_ner.py example (Guide Link).
BERTurk-128k fine-tuned on HisTR
Strict F1 (dev): 0.873 (default seqeval metrics used)
Strict F1 (Ruznamçe test) (nervaluate used): to be filled
...
Parameter | Value |
---|---|
Base model | dbmdz/bert-base-turkish-128k-cased |
Task | NER |
Max sequence length | 128 |
Train batch size | 16 |
Eval batch size | 8 |
Learning rate | 3 × 10⁻⁵ |
Epochs | 5 (best at 4) |
Optim steps | 145 |
Gradient accumulation | 1 |
Mixed precision | Disabled (fp16=False ) |
Device | 1 × A100 (40 GB) |
Example use
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
model_id = "cihanunlu/BERTurk_HisTR_NER_v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
ner = pipeline("ner", model=model, tokenizer=tokenizer,
aggregation_strategy="simple")
sentence = "Mevlânâzâde Ahmed Hulûsî Efendi, 1293 senesinde Haleb vilâyetine memur edilmiştir."
print(ner(sentence))
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for cihanunlu/BERTurk_HisTR_NER_v2
Base model
dbmdz/bert-base-turkish-128k-cased