--- license: apache-2.0 datasets: - Utkarsha666/NaBI language: - ne base_model: - google-bert/bert-base-multilingual-cased pipeline_tag: text-classification tags: - hate_speech - bias - misinformation --- # NaBI Model: Nepali Bias & Information Classifier The **NaBI Model** is a text classifier for Nepali content, designed to automatically detect bias, misinformation, and hate speech. Trained on a balanced dataset created using oversampling techniques to address class imbalances in the real-world NaBI data, the model achieves **99% accuracy** on this balanced split. ## Overview - **Task:** Multi-Class Text Classification **Categories:** - Bias (editorial bias, user comment bias, etc.) - Normal - Misinformation - Hate Speech - **Model Performance:** Achieves **99% accuracy** on a balanced dataset obtained via oversampling to mitigate class imbalance. Please note that further inference using the model on real-world data can help label additional biased and misinformation news, paving the way for continuous dataset expansion. - **Dataset Details:** The dataset is derived from real-world Nepali content, which was originally imbalanced. Oversampling was used during training to ensure sufficient representation of underrepresented classes. - **Real-World Implications and Future Work:** Although oversampling allowed the model to learn effectively from balanced data, the original dataset remains imbalanced. Further inference using this model on unlabeled real-world data (biased, misinformation news, etc.) can facilitate the creation of a larger, more diverse dataset over time. ## Usage Below is a simple example of how to use the NaBI Model with the Hugging Face Transformers library: ```python from transformers import pipeline # Load the model classifier = pipeline("text-classification", model="Utkarsha666/NaBI-Bert") # Classify a sample Nepali text sample_text = "यहाँ नेपालीमा तपाईंको पाठ राख्नुहोस्।" result = classifier(sample_text) print(result)