mehmet0sahinn/xlm-roberta-base-cased-ner-turkish

Model Description

This model fine-tuned xlm-roberta-base on English and Turkish PAN-X data from the google/xtreme.

Label Scheme

O means the word doesn’t correspond to any entity.
B-PER/I-PER means the word corresponds to the beginning of/is inside a person entity.
B-ORG/I-ORG means the word corresponds to the beginning of/is inside an organization entity.
B-LOC/I-LOC means the word corresponds to the beginning of/is inside a location entity. source

Evaluation

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
No log	1.0	417	0.1159	0.9689	0.9042	0.9274	0.9157
0.0895	2.0	834	0.1148	0.9707	0.9185	0.9228	0.9207
0.0895	3.0	1251	0.1209	0.9714	0.9171	0.9311	0.9241
0.0485	4.0	1668	0.1222	0.9725	0.9212	0.9335	0.9273

The model demonstrates strong generalization without significant overfitting.

Usage Example

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

model = AutoModelForTokenClassification.from_pretrained("mehmet0sahinn/xlm-roberta-base-cased-ner-turkish")
tokenizer = AutoTokenizer.from_pretrained("mehmet0sahinn/xlm-roberta-base-cased-ner-turkish")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

text = "Mustafa Kemal Atatürk 1881 yılında Selanik'te doğdu."
ner_results = nlp(text)

for entity in ner_results:
    print(entity)

Dataset

Source: PAN-X from the google/xtreme.
Languages: English, Turkish
Training size 20K (EN) + 20K (TR) rows
Validation size 10K (EN) + 10K (TR)
Test size 10K (EN) + 10K (TR)

mehmet0sahinn
/

xlm-roberta-base-cased-ner-turkish

Model Description

Label Scheme

Evaluation

Usage Example

Dataset

Links

Model tree for mehmet0sahinn/xlm-roberta-base-cased-ner-turkish

Dataset used to train mehmet0sahinn/xlm-roberta-base-cased-ner-turkish

Space using mehmet0sahinn/xlm-roberta-base-cased-ner-turkish 1

Evaluation results