Gujarati-BERT-NER 🪔🇮🇳
This is a fine-tuned Named Entity Recognition (NER) model for the Gujarati language based on the GujaratiBERT model. It has been trained on the Naamapadam dataset.
🏷️ Named Entity Types
- PER - Person (વ્યક્તિનું નામ)
- LOC - Location (સ્થળ)
- ORG - Organization (સંસ્થા)
- O - Non-entity tokens
📌 Example Usage
from transformers import pipeline
ner_pipeline = pipeline(
"token-classification",
model="Kantkamal/Gujarati-BERT-NER",
tokenizer="Kantkamal/Gujarati-BERT-NER",
aggregation_strategy="simple"
)
text = "મહાત્મા ગાંધીજી પોરબંદર ખાતે જન્મ્યા હતા અને તેમને અહિંસા માટે જાણવામાં આવે છે."
ner_results = ner_pipeline(text)
for entity in ner_results:
print(f"{entity['word']} -> {entity['entity_group']} ({entity['score']:.2f})")
Label and ID Mapping
Label ID | Label |
---|---|
0 | O |
1 | B-PER |
2 | I-PER |
3 | B-ORG |
4 | I-ORG |
5 | B-LOC |
6 | I-LOC |
🚀 Load Manually
from transformers import AutoTokenizer, AutoModelForTokenClassification
model = AutoModelForTokenClassification.from_pretrained("Kantkamal/Gujarati-BERT-NER")
tokenizer = AutoTokenizer.from_pretrained("Kantkamal/Gujarati-BERT-NER")
Training Details
There were a total of 472,845 rows in the training split of the dataset Naamapadam for the Gujarati language. A random sample of 400,000 rows was selected from the training dataset. 320,000 rows (80%) were allocated for model training, while 80,000 rows (20%) designated for validation. Model training was conducted using a local system equipped with an NVIDIA GeForce RTX 4060.
Evaluation Results
Model | Precision | Recall | F1 | Accuracy | Loss |
---|---|---|---|---|---|
Gujarati-BERT-NER | 0.8052 | 0.8424 | 0.8234 | 0.9244 | 0.1985 |
🔍 Try it out
મહાત્મા ગાંધીજી પોરબંદર ખાતે જન્મ્યા હતા અને તેઓ અહિંસા માટે જાણીતા છે.
Developed by Chandrakant Bhogayata
- Downloads last month
- 43
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
2
Ask for provider support
Model tree for Kantkamal/Gujarati-BERT-NER
Base model
l3cube-pune/gujarati-bert