You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Toxic Conversation Classifier

This model is a fine-tuned transformer-based model designed to classify whether a given conversation excerpt contains toxic language.

Model Description

This model takes a text input and predicts whether the text is considered toxic. It was trained on the mteb/toxic_conversations_50k dataset, which comprises a diverse set of online conversations labeled for toxicity. The architecture leverages the power of pre-trained language models to understand the nuances of language and identify patterns associated with toxicity.

Intended Uses & Limitations

This model is intended for research purposes and for developers looking to build applications that can identify and filter toxic content in online conversations. Potential use cases include:

  • Content Moderation: Assisting human moderators in identifying and flagging potentially toxic comments or messages.
  • Building Safer Online Communities: Integrating the model into platforms to help create more positive and inclusive online environments.
  • Research on Online Toxicity: Serving as a tool for researchers studying the prevalence and nature of toxic language online.

Limitations:

  • Bias: As with any model trained on human-labeled data, this model may exhibit biases present in the training dataset. It's crucial to be aware that what constitutes "toxic" can be subjective and culturally dependent.
  • Contextual Understanding: While the model is capable of capturing some contextual nuances, it might misclassify instances where sarcasm or irony are present.
  • Evolving Language: The nature of toxic language can evolve rapidly, and the model's performance might degrade over time as new forms of online harassment emerge.
  • False Positives/Negatives: The model may incorrectly classify non-toxic text as toxic (false positive) or fail to identify truly toxic content (false negative). Careful consideration should be given to the impact of these errors in any application.

How to Use

You can easily load and use this model with the Hugging Face transformers library:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_name = "frameai/ToxicSentences-detector" 
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "This is a terrible thing to say!"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
predictions = outputs.logits.argmax().item()

if predictions == 1:
    print("The text is predicted as toxic.")
else:
    print("The text is predicted as non-toxic.")
Downloads last month
0
Safetensors
Model size
178M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train frameai/ToxicSentences-detector