Model Description

This model fine-tuned xlm-roberta-base on English and Turkish PAN-X data from the google/xtreme.


Label Scheme

  • O means the word doesn’t correspond to any entity.
  • B-PER/I-PER means the word corresponds to the beginning of/is inside a person entity.
  • B-ORG/I-ORG means the word corresponds to the beginning of/is inside an organization entity.
  • B-LOC/I-LOC means the word corresponds to the beginning of/is inside a location entity. source

Evaluation

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
No log 1.0 417 0.1159 0.9689 0.9042 0.9274 0.9157
0.0895 2.0 834 0.1148 0.9707 0.9185 0.9228 0.9207
0.0895 3.0 1251 0.1209 0.9714 0.9171 0.9311 0.9241
0.0485 4.0 1668 0.1222 0.9725 0.9212 0.9335 0.9273

The model demonstrates strong generalization without significant overfitting.


Usage Example

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

model = AutoModelForTokenClassification.from_pretrained("mehmet0sahinn/xlm-roberta-base-cased-ner-turkish")
tokenizer = AutoTokenizer.from_pretrained("mehmet0sahinn/xlm-roberta-base-cased-ner-turkish")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

text = "Mustafa Kemal Atatürk 1881 yılında Selanik'te doğdu."
ner_results = nlp(text)

for entity in ner_results:
    print(entity)

Dataset

  • Source: PAN-X from the google/xtreme.
  • Languages: English, Turkish
  • Training size 20K (EN) + 20K (TR) rows
  • Validation size 10K (EN) + 10K (TR)
  • Test size 10K (EN) + 10K (TR)

Links

Downloads last month
6
Safetensors
Model size
277M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mehmet0sahinn/xlm-roberta-base-cased-ner-turkish

Finetuned
(3373)
this model

Dataset used to train mehmet0sahinn/xlm-roberta-base-cased-ner-turkish

Space using mehmet0sahinn/xlm-roberta-base-cased-ner-turkish 1