---
license: apache-2.0
datasets:
- Utkarsha666/NaBI
language:
- ne
base_model:
- google-bert/bert-base-multilingual-cased
pipeline_tag: text-classification
tags:
- hate_speech
- bias
- misinformation
---
# NaBI Model: Nepali Bias & Information Classifier

The **NaBI Model** is a text classifier for Nepali content, designed to automatically detect bias, misinformation, and hate speech. 
Trained on a balanced dataset created using oversampling techniques to address class imbalances in the real-world NaBI data, the model achieves **99% accuracy** 
on this balanced split.

## Overview

- **Task:** Multi-Class Text Classification  
  **Categories:**  
  - Bias (editorial bias, user comment bias, etc.)
  - Normal
  - Misinformation
  - Hate Speech

- **Model Performance:**  
  Achieves **99% accuracy** on a balanced dataset obtained via oversampling to mitigate class imbalance.
  Please note that further inference using the model on real-world data can help label additional biased and misinformation news, paving the way for continuous dataset expansion.

- **Dataset Details:**  
  The dataset is derived from real-world Nepali content, which was originally imbalanced. Oversampling was used during training to ensure sufficient representation of underrepresented classes.

- **Real-World Implications and Future Work:**  
  Although oversampling allowed the model to learn effectively from balanced data, the original dataset remains imbalanced. Further inference using this model on unlabeled real-world data (biased, misinformation news, etc.) can facilitate the creation of a larger, more diverse dataset over time.

## Usage
Below is a simple example of how to use the NaBI Model with the Hugging Face Transformers library:

```python
from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", model="Utkarsha666/NaBI-Bert")

# Classify a sample Nepali text
sample_text = "यहाँ नेपालीमा तपाईंको पाठ राख्नुहोस्।"
result = classifier(sample_text)

print(result)