Fined-tuned roBERTa for Toxicity Classification in Spanish

This is a fine-tuned roBERTa model trained using as a base model Twitter-roBERTa base-sized for Sentiment Analysis, which was trained on ~58M tweets. The dataset for training this model is a gold standard for protest events for toxicity and incivility in Spanish.

The dataset comprises ~5M data points from three Latin American protest events: (a) protests against the coronavirus and judicial reform measures in Argentina during August 2020; (b) protests against education budget cuts in Brazil in May 2019; and (c) the social outburst in Chile stemming from protests against the underground fare hike in October 2019. We are focusing on interactions in Spanish to elaborate a gold standard for digital interactions in this language, therefore, we prioritise Argentinian and Chilean data.

Labels: NONTOXIC and TOXIC.

We suggest using bert-spanish-toxicity or ft-xlm-roberta-toxicity instead of this model.

Validation Metrics

  • Accuracy: 0.790
  • Precision: 0.920
  • Recall: 0.657
  • F1-Score: 0.767
Downloads last month
1,228
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for bgonzalezbustamante/ft-roberta-toxicity

Finetuned
(25)
this model

Dataset used to train bgonzalezbustamante/ft-roberta-toxicity

Collection including bgonzalezbustamante/ft-roberta-toxicity