Gujarati-BERT-NER 🪔🇮🇳

This is a fine-tuned Named Entity Recognition (NER) model for the Gujarati language based on the GujaratiBERT model. It has been trained on the Naamapadam dataset.

🏷️ Named Entity Types

PER - Person (વ્યક્તિનું નામ)
LOC - Location (સ્થળ)
ORG - Organization (સંસ્થા)
O - Non-entity tokens

📌 Example Usage

from transformers import pipeline

ner_pipeline = pipeline(
    "token-classification",
    model="Kantkamal/Gujarati-BERT-NER",
    tokenizer="Kantkamal/Gujarati-BERT-NER",
    aggregation_strategy="simple"
)

text = "મહાત્મા ગાંધીજી પોરબંદર ખાતે જન્મ્યા હતા અને તેમને અહિંસા માટે જાણવામાં આવે છે."
ner_results = ner_pipeline(text)

for entity in ner_results:
    print(f"{entity['word']} -> {entity['entity_group']} ({entity['score']:.2f})")

Label and ID Mapping

Label ID	Label
0	O
1	B-PER
2	I-PER
3	B-ORG
4	I-ORG
5	B-LOC
6	I-LOC

🚀 Load Manually

from transformers import AutoTokenizer, AutoModelForTokenClassification

model = AutoModelForTokenClassification.from_pretrained("Kantkamal/Gujarati-BERT-NER")
tokenizer = AutoTokenizer.from_pretrained("Kantkamal/Gujarati-BERT-NER")

Training Details

There were a total of 472,845 rows in the training split of the dataset Naamapadam for the Gujarati language. A random sample of 400,000 rows was selected from the training dataset. 320,000 rows (80%) were allocated for model training, while 80,000 rows (20%) were reserved for validation. Model training was conducted using a local system equipped with an NVIDIA GeForce RTX 4060.

Evaluation Results

Model	Precision	Recall	F1	Accuracy	Loss
Gujarati-BERT-NER	0.8052	0.8424	0.8234	0.9244	0.1985

🔍 Try it out!

Check out the interactive demo for this model hosted on Hugging Face Spaces:

👉 Try the model in action

Developed by Chandrakant Bhogayata

Kantkamal
/

Gujarati-BERT-NER

You need to agree to share your contact information to access this model