metadata

license: apache-2.0
datasets:
  - custom
  - chatgpt
language:
  - en
metrics:
  - precision
  - recall
  - f1
  - accuracy
pipeline_tag: token-classification
library_name: transformers
new_version: v1.1
tags:
  - token-classification
  - ner
  - named-entity-recognition
  - text-classification
  - sequence-labeling
  - transformer
  - bert
  - nlp
  - pretrained-model
  - dataset-finetuning
  - deep-learning
  - huggingface
  - conll2012
  - real-time-inference
  - efficient-nlp
  - high-accuracy
  - gpu-optimized
  - chatbot
  - information-extraction
  - search-enhancement
  - knowledge-graph
  - legal-nlp
  - medical-nlp
  - financial-nlp
base_model:
  - boltuix/bert-lite

🌟 Boltuix BERT-NER Model 🌟

🚀 Model Details

🌈 Description

✨ Fine-tuned for Named Entity Recognition (NER)
📚 Dataset: CoNLL-2012
🔍 Recognizes 37 entity types across diverse domains like people, places, organizations, laws, events, and more!
💬 Works great for sentence-level and document-level tagging in English.
🧠 Training examples: 115,812 | ✅ Validation: 15,680 | 🧪 Test: 12,217

🔧 Info

Developer: Boltuix 🧙‍♂️
Fuel: Passion 🧠
License: Apache 2.0 📜
Language: English 🇬🇧
Type: Transformer-based Token Classification 🤖
Version: v1.0 🎈
Trained: Before March 27, 2025

🔗 Links

🧠 Model Repo
📄 CoNLL-2012 Paper
💡 Demo: Coming Soon

🎯 Use Cases for NER

🌟 Direct Applications

Extracting names, places, and dates from news, blogs, and reports
Powering chatbots with contextual awareness
Enhancing search with semantic understanding
Building dynamic knowledge graphs

🌱 Downstream Tasks

Medical & legal domain adaptation
Multilingual extensions (with retraining)
Custom entity sets for finance, e-commerce, etc.

❌ Limitations

📌 English-only out of the box
🚫 May not generalize to informal, low-resource, or code-mixed texts
⚖️ May reflect dataset bias (CoNLL-2012 is newswire-heavy)

🛠️ Getting Started

🧪 Inference Code

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("boltuix/bert-ner")
model = AutoModelForTokenClassification.from_pretrained("boltuix/bert-ner")

text = "Barack Obama visited Microsoft headquarters in Seattle."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=-1)

tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
label_map = model.config.id2label
labels = [label_map[p.item()] for p in predictions[0]]

for token, label in zip(tokens, labels):
    if token not in tokenizer.all_special_tokens:
        print(f"{token:15} → {label}")

✨ Example Output

barack          → B-PERSON
obama           → I-PERSON
visited         → O
microsoft       → B-ORG
headquarters    → O
in              → O
seattle         → B-GPE
.               → O

🧠 Entity Labels (CoNLL-2012)

Here are all 37 labels supported by the model:

🔹 O – Outside

📈 Performance

Metric	Score
🎯 Precision	0.85
🕸️ Recall	0.87
🎶 F1 Score	0.86
✅ Accuracy	0.92

📊 Evaluation tool: seqeval
🧪 Dataset: CoNLL-2012 test split

⚙️ Training Setup

💻 Hardware: NVIDIA GPU
⏱️ Training Time: ~2 hours
🐘 Parameters: ~11M
🎛️ Optimizer: AdamW (default)
📦 Mixed precision: No (fp32)

🌍 Carbon Impact

💻 Trained Locally
☁️ Region: Boltuix’s Base
🌱 Emissions: ~50g CO₂eq
📊 Measured via: ML Impact

✍️ Contact

🧠 Author: Boltuix
📬 Email: [email protected]

boltuix
/

NeuroBERT-NER

🌟 Boltuix BERT-NER Model 🌟

🚀 Model Details

🌈 Description

🔧 Info

🔗 Links

🎯 Use Cases for NER

🌟 Direct Applications

🌱 Downstream Tasks

❌ Limitations

🛠️ Getting Started

🧪 Inference Code

✨ Example Output

🧠 Entity Labels (CoNLL-2012)

🔢 Beginning (B-) Tags

🔢 Inside (I-) Tags

📈 Performance

⚙️ Training Setup

🌍 Carbon Impact

✍️ Contact