You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Gujarati-BERT-NER 🪔🇮🇳

This is a fine-tuned Named Entity Recognition (NER) model for the Gujarati language based on the GujaratiBERT model. It has been trained on the Naamapadam dataset.

🏷️ Named Entity Types

  • PER - Person (વ્યક્તિનું નામ)
  • LOC - Location (સ્થળ)
  • ORG - Organization (સંસ્થા)
  • O - Non-entity tokens

📌 Example Usage

from transformers import pipeline

ner_pipeline = pipeline(
    "token-classification",
    model="Kantkamal/Gujarati-BERT-NER",
    tokenizer="Kantkamal/Gujarati-BERT-NER",
    aggregation_strategy="simple"
)

text = "મહાત્મા ગાંધીજી પોરબંદર ખાતે જન્મ્યા હતા અને તેમને અહિંસા માટે જાણવામાં આવે છે."
ner_results = ner_pipeline(text)

for entity in ner_results:
    print(f"{entity['word']} -> {entity['entity_group']} ({entity['score']:.2f})") 

Label and ID Mapping

Label ID Label
0 O
1 B-PER
2 I-PER
3 B-ORG
4 I-ORG
5 B-LOC
6 I-LOC

🚀 Load Manually

from transformers import AutoTokenizer, AutoModelForTokenClassification

model = AutoModelForTokenClassification.from_pretrained("Kantkamal/Gujarati-BERT-NER")
tokenizer = AutoTokenizer.from_pretrained("Kantkamal/Gujarati-BERT-NER") 

Training Details

There were a total of 472,845 rows in the training split of the dataset Naamapadam for the Gujarati language. A random sample of 400,000 rows was selected from the training dataset. 320,000 rows (80%) were allocated for model training, while 80,000 rows (20%) designated for validation. Model training was conducted using a local system equipped with an NVIDIA GeForce RTX 4060.

Evaluation Results

Model Precision Recall F1 Accuracy Loss
Gujarati-BERT-NER 0.8052 0.8424 0.8234 0.9244 0.1985

🔍 Try it out

મહાત્મા ગાંધીજી પોરબંદર ખાતે જન્મ્યા હતા અને તેઓ અહિંસા માટે જાણીતા છે.


Developed by Chandrakant Bhogayata
Downloads last month
43
Safetensors
Model size
237M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Model tree for Kantkamal/Gujarati-BERT-NER

Finetuned
(2)
this model