Hate Speech Detector

"Hate Speech Detector" is a text classification model based on Deberta that predicts whether a text contains hate speech or not. The model is fine-tuned on the tweet_eval dataset, which consists of seven heterogeneous tasks in Twitter, all framed as multi-class tweet classification. The 'hate' subset is used for this task.

This model is part of our series in moderation models, which includes the following other models that may be of interest to you:

Offensive Speech Detector

We believe these models can be used in tandem to support one another and thus build a more robust moderation tool, for example.

Intended uses & limitations

Offensive Speech Detector is intended to be used as a tool for detecting hate speech in texts, which can be useful for applications such as content moderation, sentiment analysis, or social media analysis. The model can be used to filter out or flag tweets that contain hate speech, or to analyze the prevalence and patterns of hate speech.

However, the model has some limitations that users should be aware of:

The model is only trained and evaluated on tweets, which are short and informal texts that may contain slang, abbreviations, emojis, hashtags, or user mentions. The model may not perform well on other types of texts, such as news articles, essays, or books.
The model is only trained and evaluated on English tweets. The model may not generalize well to other languages or dialects.
The model is based on the tweet_eval dataset, which may have some biases or errors in the annotation process. The labels are assigned by human annotators, who may have different opinions or criteria for what constitutes hate speech. The dataset may also not cover all possible forms or contexts, such as sarcasm, irony, humor, or euphemism.
The model is a statistical classifier that outputs a probability score for each label. The model does not provide any explanation or justification for its predictions. The model may also make mistakes or produce false positives or false negatives. Users should not blindly trust the model's predictions without further verification or human oversight.

Ethical Considerations

This is a model that deals with sensitive and potentially harmful language. Users should consider the ethical implications and potential risks of using or deploying this model in their applications or contexts. Some of the ethical issues that may arise are:

The model may reinforce or amplify existing biases or stereotypes in the data or in the society. For example, the model may associate certain words or topics with offensive language based on the frequency or co-occurrence in the data, without considering the meaning or intent behind them. This may result in unfair or inaccurate predictions for some groups or individuals.

Users should carefully consider the purpose, context, and impact of using this model, and take appropriate measures to prevent or mitigate any potential harm. Users should also respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions.

License

This model is licensed under the CodeML OpenRAIL-M 0.1 license, which is a variant of the BigCode OpenRAIL-M license. This license allows you to freely access, use, modify, and distribute this model and its derivatives, for research, commercial or non-commercial purposes, as long as you comply with the following conditions:

You must include a copy of the license and the original source of the model in any copies or derivatives of the model that you distribute.
You must not use the model or its derivatives for any unlawful, harmful, abusive, discriminatory, or offensive purposes, or to cause or contribute to any social or environmental harm.
You must respect the privacy and consent of the data subjects whose data was used to train or evaluate the model, and adhere to the relevant laws and regulations in your jurisdiction.
You must acknowledge that the model and its derivatives are provided "as is", without any warranties or guarantees of any kind, and that the licensor is not liable for any damages or losses arising from your use of the model or its derivatives.

By accessing or using this model, you agree to be bound by the terms of this license. If you do not agree with the terms of this license, you must not access or use this model.

Model Training Info

Problem type: Multi-class Classification
CO2 Emissions (in grams): 0.8636

Validation Metrics

Loss: 0.500
Accuracy: 0.763
Macro F1: 0.761
Micro F1: 0.763
Weighted F1: 0.764
Macro Precision: 0.763
Micro Precision: 0.763
Weighted Precision: 0.775
Macro Recall: 0.769
Micro Recall: 0.763
Weighted Recall: 0.763

Usage

You can use cURL to access this model:

$ curl -X POST -H "Authorization: Bearer YOUR_API_KEY" -H "Content-Type: application/json" -d '{"inputs": "I love AutoTrain"}' https://api-inference.huggingface.co/models/KoalaAI/HateSpeechDetector

Or Python API:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("KoalaAI/HateSpeechDetector", use_auth_token=True)

tokenizer = AutoTokenizer.from_pretrained("KoalaAI/HateSpeechDetector", use_auth_token=True)

inputs = tokenizer("I love AutoTrain", return_tensors="pt")

outputs = model(**inputs)

Downloads last month: 50

Safetensors

Model size

0.1B params

Tensor type

I64

F32

KoalaAI
/

HateSpeechDetector