antypasd commited on
Commit
cce0274
·
1 Parent(s): 7598d5f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -47
README.md CHANGED
@@ -1,47 +1,70 @@
1
- ---
2
- tags:
3
- - generated_from_keras_callback
4
- model-index:
5
- - name: twitter-xlm-roberta-base-hate-spanish
6
- results: []
7
- ---
8
-
9
- <!-- This model card has been generated automatically according to the information Keras had access to. You should
10
- probably proofread and complete it, then remove this comment. -->
11
-
12
- # twitter-xlm-roberta-base-hate-spanish
13
-
14
- This model was trained from scratch on an unknown dataset.
15
- It achieves the following results on the evaluation set:
16
-
17
-
18
- ## Model description
19
-
20
- More information needed
21
-
22
- ## Intended uses & limitations
23
-
24
- More information needed
25
-
26
- ## Training and evaluation data
27
-
28
- More information needed
29
-
30
- ## Training procedure
31
-
32
- ### Training hyperparameters
33
-
34
- The following hyperparameters were used during training:
35
- - optimizer: None
36
- - training_precision: float32
37
-
38
- ### Training results
39
-
40
-
41
-
42
- ### Framework versions
43
-
44
- - Transformers 4.21.2
45
- - TensorFlow 2.10.0
46
- - Datasets 2.9.0
47
- - Tokenizers 0.12.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # cardiffnlp/twitter-xlm-roberta-base-hate-spanish
2
+
3
+ This model is a fine-tuned version of [cardiffnlp/twitter-xlm-roberta-base](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base) using the [`HaterNet`](https://zenodo.org/record/2592149) dataset and the Spanish subset of
4
+ [`SemEval-2019 Task 5`](https://aclanthology.org/S19-2007/).
5
+
6
+ ## Following metrics are achieved
7
+
8
+ * `on the test split of SemEval-2019 Task 5`
9
+
10
+ - F1 (weighted): 0.7866
11
+ - F1 (macro): 0.7935
12
+ - Accuracy: 0.7937
13
+
14
+ * on custom test split of `Haternet`
15
+
16
+ - F1 (weighted): 0.7815
17
+ - F1 (macro): 0.6981
18
+ - Accuracy: 0.7933
19
+
20
+ * on `Haternet` & `SemEval-2019 Task 5`
21
+ - F1 (weighted): 0.7908
22
+ - F1 (macro): 0.7657
23
+ - Accuracy: 0.7936
24
+
25
+
26
+
27
+ ### Usage
28
+ Install tweetnlp via pip.
29
+ ```shell
30
+ pip install tweetnlp
31
+ ```
32
+ Load the model in python.
33
+ ```python
34
+ import tweetnlp
35
+ model = tweetnlp.Classifier("cardiffnlp/twitter-xlm-roberta-base-hate-spanish")
36
+ model.predict('Ismael es egocentrico porque se vuelve loca si le dicen que tiene el pelo bonito😂😂😂😂 eso se define con otro objetivo #FirstDates251')
37
+ >> {'label': 'NOT-HATE'}
38
+
39
+ ```
40
+
41
+
42
+
43
+ ### Datasets
44
+ @inproceedings{basile-etal-2019-semeval,
45
+ title = "{S}em{E}val-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in {T}witter",
46
+ author = "Basile, Valerio and
47
+ Bosco, Cristina and
48
+ Fersini, Elisabetta and
49
+ Nozza, Debora and
50
+ Patti, Viviana and
51
+ Rangel Pardo, Francisco Manuel and
52
+ Rosso, Paolo and
53
+ Sanguinetti, Manuela",
54
+ booktitle = "Proceedings of the 13th International Workshop on Semantic Evaluation",
55
+ month = jun,
56
+ year = "2019",
57
+ address = "Minneapolis, Minnesota, USA",
58
+ publisher = "Association for Computational Linguistics",
59
+ url = "https://aclanthology.org/S19-2007",
60
+ doi = "10.18653/v1/S19-2007",
61
+ pages = "54--63",
62
+ abstract = "The paper describes the organization of the SemEval 2019 Task 5 about the detection of hate speech against immigrants and women in Spanish and English messages extracted from Twitter. The task is organized in two related classification subtasks: a main binary subtask for detecting the presence of hate speech, and a finer-grained one devoted to identifying further features in hateful contents such as the aggressive attitude and the target harassed, to distinguish if the incitement is against an individual rather than a group. HatEval has been one of the most popular tasks in SemEval-2019 with a total of 108 submitted runs for Subtask A and 70 runs for Subtask B, from a total of 74 different teams. Data provided for the task are described by showing how they have been collected and annotated. Moreover, the paper provides an analysis and discussion about the participant systems and the results they achieved in both subtasks.",
63
+ }
64
+
65
+ @article{quijano2019haternet,
66
+ title={HaterNet a system for detecting and analyzing hate speech in Twitter (Version 1.0)[Data set]},
67
+ author={Quijano-Sanchez, Lara and Kohatsu, Juan Carlos Pereira and Liberatore, Federico and Camacho-Collados, Miguel},
68
+ journal={Zenodo},
69
+ year={2019}
70
+ }