metadata

license: apache-2.0
datasets:
  - Utkarsha666/NaBI
language:
  - ne
base_model:
  - google-bert/bert-base-multilingual-cased
pipeline_tag: text-classification
tags:
  - hate_speech
  - bias
  - misinformation

NaBI Model: Nepali Bias & Information Classifier

The NaBI Model is a text classifier for Nepali content, designed to automatically detect bias, misinformation, and hate speech. Trained on a balanced dataset created using oversampling techniques to address class imbalances in the real-world NaBI data, the model achieves 99% accuracy on this balanced split.

Overview

Task: Multi-Class Text Classification
Categories:
- Bias (editorial bias, user comment bias, etc.)
- Normal
- Misinformation
- Hate Speech
Model Performance:
Achieves 99% accuracy on a balanced dataset obtained via oversampling to mitigate class imbalance. Please note that further inference using the model on real-world data can help label additional biased and misinformation news, paving the way for continuous dataset expansion.
Dataset Details:
The dataset is derived from real-world Nepali content, which was originally imbalanced. Oversampling was used during training to ensure sufficient representation of underrepresented classes.
Real-World Implications and Future Work:
Although oversampling allowed the model to learn effectively from balanced data, the original dataset remains imbalanced. Further inference using this model on unlabeled real-world data (biased, misinformation news, etc.) can facilitate the creation of a larger, more diverse dataset over time.

Usage

Below is a simple example of how to use the NaBI Model with the Hugging Face Transformers library:

from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", model="Utkarsha666/NaBI-Bert")

# Classify a sample Nepali text
sample_text = "यहाँ नेपालीमा तपाईंको पाठ राख्नुहोस्।"
result = classifier(sample_text)

print(result)