๐ง Kurdish NER with XLM-R
This is a fine-tuned xlm-roberta-base
model for Named Entity Recognition (NER) in Kurmanji Kurdish. It was trained on a manually annotated dataset of over 8,000 sentences. The model identifies the following entity types:
- PER: Person
- LOC: Location
- ORG: Organization
๐ค Model Details
- Base model:
xlm-roberta-base
(270 M parameters) - Fine-tuning
- Epochs: 5
- Batch size: 16
- Max seq length: 128 tokens
- Optimizer: AdamW
- Learning rate: 2e-5
- Warmup steps: 500
- Weight decay: 0.01
๐ Intended Use
- Extract named entities from Kurmanji Kurdish text (news, social media, etc.)
- Aid in information extraction, digital humanities, and low-resource language research
๐งช Evaluation Metrics
Test set: 1,630 sentences (โ26 k tokens)
Entity | Precision | Recall | F1 Score |
---|---|---|---|
PER | 0.8719 | 0.8666 | 0.8692 |
LOC | 0.8817 | 0.8825 | 0.8821 |
ORG | 0.7280 | 0.7930 | 0.7591 |
Overall | 0.8325 | 0.8511 | 0.8414 |
๐ Try it Online
๐ Streamlit Demo
Paste a sentence in Kurmanji Kurdish (Latin script) and explore the modelโs predictions in your browser.
๐ ๏ธ How to Use
You can also load and use the model via Hugging Face ๐ค Transformers:
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
# Load model and tokenizer
model_id = "akam-ot/ku-ner-xlmr"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)
# Create NER pipeline
ner = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
# Example sentence
sentence = "Navรช min Hejar e รป ez li Hewlรชr dijรฎm."
# Run NER
results = ner(sentence)
# Display results
for ent in results:
print(f"{ent['word']} โ {ent['entity_group']} (score: {ent['score']:.2f})")
- Downloads last month
- 77
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for akam-ot/ku-ner-xlmr
Base model
FacebookAI/xlm-roberta-base