cardiffnlp
/

twitter-xlm-roberta-base-hate-spanish

Text Classification

Transformers

PyTorch

TensorFlow

xlm-roberta

Model card Files Files and versions Community

antypasd commited on Mar 28, 2023

Commit

cce0274

1 Parent(s): 7598d5f

Create README.md

Browse files

Files changed (1) hide show

README.md +70 -47

README.md CHANGED Viewed

@@ -1,47 +1,70 @@
----
-tags:
-- generated_from_keras_callback
-model-index:
-- name: twitter-xlm-roberta-base-hate-spanish
-  results: []
----
-<!-- This model card has been generated automatically according to the information Keras had access to. You should
-probably proofread and complete it, then remove this comment. -->
-# twitter-xlm-roberta-base-hate-spanish
-This model was trained from scratch on an unknown dataset.
-It achieves the following results on the evaluation set:
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- optimizer: None
-- training_precision: float32
-### Training results
-### Framework versions
-- Transformers 4.21.2
-- TensorFlow 2.10.0
-- Datasets 2.9.0
-- Tokenizers 0.12.1

+# cardiffnlp/twitter-xlm-roberta-base-hate-spanish
+This model is a fine-tuned version of [cardiffnlp/twitter-xlm-roberta-base](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base) using the [`HaterNet`](https://zenodo.org/record/2592149) dataset and the Spanish subset of
+[`SemEval-2019 Task 5`](https://aclanthology.org/S19-2007/).
+## Following metrics are achieved
+* `on the test split of SemEval-2019 Task 5`
+  - F1 (weighted):  0.7866
+  - F1 (macro):  0.7935
+  - Accuracy:  0.7937
+* on custom test split of `Haternet`
+  - F1 (weighted): 0.7815
+  - F1 (macro):  0.6981
+  - Accuracy: 0.7933
+* on `Haternet` & `SemEval-2019 Task 5`
+  - F1 (weighted): 0.7908
+  - F1 (macro):  0.7657
+  - Accuracy: 0.7936
+### Usage
+Install tweetnlp via pip.
+```shell
+pip install tweetnlp
+```
+Load the model in python.
+```python
+import tweetnlp
+model = tweetnlp.Classifier("cardiffnlp/twitter-xlm-roberta-base-hate-spanish")
+model.predict('Ismael es egocentrico porque se vuelve loca si le dicen que tiene el pelo bonito😂😂😂😂 eso se define con otro objetivo #FirstDates251')
+>> {'label': 'NOT-HATE'}
+```
+### Datasets
+@inproceedings{basile-etal-2019-semeval,
+    title = "{S}em{E}val-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in {T}witter",
+    author = "Basile, Valerio  and
+      Bosco, Cristina  and
+      Fersini, Elisabetta  and
+      Nozza, Debora  and
+      Patti, Viviana  and
+      Rangel Pardo, Francisco Manuel  and
+      Rosso, Paolo  and
+      Sanguinetti, Manuela",
+    booktitle = "Proceedings of the 13th International Workshop on Semantic Evaluation",
+    month = jun,
+    year = "2019",
+    address = "Minneapolis, Minnesota, USA",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/S19-2007",
+    doi = "10.18653/v1/S19-2007",
+    pages = "54--63",
+    abstract = "The paper describes the organization of the SemEval 2019 Task 5 about the detection of hate speech against immigrants and women in Spanish and English messages extracted from Twitter. The task is organized in two related classification subtasks: a main binary subtask for detecting the presence of hate speech, and a finer-grained one devoted to identifying further features in hateful contents such as the aggressive attitude and the target harassed, to distinguish if the incitement is against an individual rather than a group. HatEval has been one of the most popular tasks in SemEval-2019 with a total of 108 submitted runs for Subtask A and 70 runs for Subtask B, from a total of 74 different teams. Data provided for the task are described by showing how they have been collected and annotated. Moreover, the paper provides an analysis and discussion about the participant systems and the results they achieved in both subtasks.",
+}
+@article{quijano2019haternet,
+  title={HaterNet a system for detecting and analyzing hate speech in Twitter (Version 1.0)[Data set]},
+  author={Quijano-Sanchez, Lara and Kohatsu, Juan Carlos Pereira and Liberatore, Federico and Camacho-Collados, Miguel},
+  journal={Zenodo},
+  year={2019}
+}