Tamil Named Entity Recognition

Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Tamil language.

Label ID and its corresponding label name

Label ID Label Name
0 O
1 B-PER
2 I-PER
3 B-ORG
4 I-ORG
5 B-LOC
6 I-LOC

Results

Step Training Loss Validation Loss Overall Precision Overall Recall Overall F1 Overall Accuracy Loc F1 Org F1 Per F1
1000 0.386900 0.300006 0.833469 0.824748 0.829086 0.912857 0.835343 0.781625 0.867752
2000 0.210200 0.251389 0.845455 0.842052 0.843750 0.924861 0.851711 0.790198 0.886515
3000 0.140000 0.264964 0.866952 0.856137 0.861510 0.930141 0.874872 0.818150 0.885203
4000 0.095400 0.298542 0.860871 0.882696 0.871647 0.935692 0.881348 0.829285 0.899245
5000 0.062200 0.296011 0.871805 0.878471 0.875125 0.938806 0.875434 0.850889 0.898148
6000 0.042200 0.320418 0.868416 0.879074 0.873713 0.937497 0.877588 0.833611 0.907737

Example

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("Ambareeshkumar/BERT-Tamil")
model = AutoModelForTokenClassification.from_pretrained("Ambareeshkumar/BERT-Tamil")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "இந்திய"
ner_results = nlp(example)
ner_results
Downloads last month
342
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train Ambareeshkumar/BERT-Tamil