BERTweet for sexism detection
This is a fine-tuned BERTweet large (BERTweet: A pre-trained language model for English Tweets) model for detecting sexism. The training dataset is new balanced version of Explainable Detection of Online Sexism (EDOS)--sexism-socialmedia-balanced--consisting of 16000 entries in English gathered from social media platforms: Twitter and Gab. It achieved a Macro-F1 score of 0.85 and an Accuracy of 0.88 on the test set for the EDOS task.
How to use
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('tum-nlp/bertweet-sexism')
model = AutoModelForSequenceClassification.from_pretrained('tum-nlp/bertweet-sexism')
# Create the pipeline for classification
sexism_classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
# Predict
sexism_classifier("Girls like attention and they get desperate")
Citation
@inproceedings{rydelek-etal-2023-adamr,
title = "{A}dam{R} at {S}em{E}val-2023 Task 10: Solving the Class Imbalance Problem in Sexism Detection with Ensemble Learning",
author = "Rydelek, Adam and
Dementieva, Daryna and
Groh, Georg",
editor = {Ojha, Atul Kr. and
Do{\u{g}}ru{\"o}z, A. Seza and
Da San Martino, Giovanni and
Tayyar Madabushi, Harish and
Kumar, Ritesh and
Sartori, Elisa},
booktitle = "Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.semeval-1.190",
doi = "10.18653/v1/2023.semeval-1.190",
pages = "1371--1381",
abstract = "The Explainable Detection of Online Sexism task presents the problem of explainable sexism detection through fine-grained categorisation of sexist cases with three subtasks. Our team experimented with different ways to combat class imbalance throughout the tasks using data augmentation and loss alteration techniques. We tackled the challenge by utilising ensembles of Transformer models trained on different datasets, which are tested to find the balance between performance and interpretability. This solution ranked us in the top 40{\%} of teams for each of the tracks.",
}
Licensing Information
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
- Downloads last month
- 15
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.