NaBI-Bert / README.md
Utkarsha666's picture
Update README.md
528b56a verified
metadata
license: apache-2.0
datasets:
  - Utkarsha666/NaBI
language:
  - ne
base_model:
  - google-bert/bert-base-multilingual-cased
pipeline_tag: text-classification
tags:
  - hate_speech
  - bias
  - misinformation

NaBI Model: Nepali Bias & Information Classifier

The NaBI Model is a text classifier for Nepali content, designed to automatically detect bias, misinformation, and hate speech. Trained on a balanced dataset created using oversampling techniques to address class imbalances in the real-world NaBI data, the model achieves 99% accuracy on this balanced split.

Overview

  • Task: Multi-Class Text Classification
    Categories:

    • Bias (editorial bias, user comment bias, etc.)
    • Normal
    • Misinformation
    • Hate Speech
  • Model Performance:
    Achieves 99% accuracy on a balanced dataset obtained via oversampling to mitigate class imbalance. Please note that further inference using the model on real-world data can help label additional biased and misinformation news, paving the way for continuous dataset expansion.

  • Dataset Details:
    The dataset is derived from real-world Nepali content, which was originally imbalanced. Oversampling was used during training to ensure sufficient representation of underrepresented classes.

  • Real-World Implications and Future Work:
    Although oversampling allowed the model to learn effectively from balanced data, the original dataset remains imbalanced. Further inference using this model on unlabeled real-world data (biased, misinformation news, etc.) can facilitate the creation of a larger, more diverse dataset over time.

Usage

Below is a simple example of how to use the NaBI Model with the Hugging Face Transformers library:

from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", model="Utkarsha666/NaBI-Bert")

# Classify a sample Nepali text
sample_text = "यहाँ नेपालीमा तपाईंको पाठ राख्नुहोस्।"
result = classifier(sample_text)

print(result)