Toxic Conversation Classifier
This model is a fine-tuned transformer-based model designed to classify whether a given conversation excerpt contains toxic language.
Model Description
This model takes a text input and predicts whether the text is considered toxic. It was trained on the mteb/toxic_conversations_50k dataset, which comprises a diverse set of online conversations labeled for toxicity. The architecture leverages the power of pre-trained language models to understand the nuances of language and identify patterns associated with toxicity.
Intended Uses & Limitations
This model is intended for research purposes and for developers looking to build applications that can identify and filter toxic content in online conversations. Potential use cases include:
- Content Moderation: Assisting human moderators in identifying and flagging potentially toxic comments or messages.
- Building Safer Online Communities: Integrating the model into platforms to help create more positive and inclusive online environments.
- Research on Online Toxicity: Serving as a tool for researchers studying the prevalence and nature of toxic language online.
Limitations:
- Bias: As with any model trained on human-labeled data, this model may exhibit biases present in the training dataset. It's crucial to be aware that what constitutes "toxic" can be subjective and culturally dependent.
- Contextual Understanding: While the model is capable of capturing some contextual nuances, it might misclassify instances where sarcasm or irony are present.
- Evolving Language: The nature of toxic language can evolve rapidly, and the model's performance might degrade over time as new forms of online harassment emerge.
- False Positives/Negatives: The model may incorrectly classify non-toxic text as toxic (false positive) or fail to identify truly toxic content (false negative). Careful consideration should be given to the impact of these errors in any application.
How to Use
You can easily load and use this model with the Hugging Face transformers
library:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model_name = "frameai/ToxicSentences-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "This is a terrible thing to say!"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
predictions = outputs.logits.argmax().item()
if predictions == 1:
print("The text is predicted as toxic.")
else:
print("The text is predicted as non-toxic.")
- Downloads last month
- 0