commited on
Update README.md
Browse files
@@ -6,24 +6,24 @@ widget:
6 |
7 |
# Spanish News Classification Headlines
8 |
9 |
SNCH: this model was
10 |
11 |
12 |
## Dataset Sample
13 |
14 |
Dataset size : 1000
15 |
16 |
Columns: idTask,task content 1,idTag,tag.
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
## Labels:
@@ -61,7 +61,7 @@ from transformers import AutoTokenizer, BertForSequenceClassification,TextClassi
61 |
62 |
63 |
review_text = 'los vehiculos que esten esperando pasajaeros deberan estar apagados para reducir emisiones'
64 |
path = "M47Labs/
65 |
tokenizer = AutoTokenizer.from_pretrained(path)
66 |
model = BertForSequenceClassification.from_pretrained(path)
67 |
@@ -74,7 +74,7 @@ print(nlp(review_text))
74 |
75 |
76 |
77 |
```[{'label': 'medio_ambiente', 'score': 0.
78 |
79 |
### Pytorch
80 |
@@ -84,7 +84,7 @@ import torch
84 |
from transformers import AutoTokenizer, BertForSequenceClassification,TextClassificationPipeline
85 |
from numpy import np
86 |
87 |
model_name = 'M47Labs/
88 |
MAX_LEN = 32
89 |
90 |
@@ -119,7 +119,7 @@ print(f'Sentiment : {model.config.id2label[prediction.detach().cpu().numpy()[0]
119 |
```Review text: las emisiones estan bajando, debido a las medidas ambientales tomadas por el gobierno```
120 |
121 |
122 |
```Sentiment :
123 |
124 |
125 |
A more in depth example on how to use the model can be found in this colab notebook: https://colab.research.google.com/drive/1XsKea6oMyEckye2FePW_XN7Rf8v41Cw_?usp=sharing
@@ -134,53 +134,15 @@ A more in depth example on how to use the model can be found in this colab noteb
134 |
* EPOCHS = 5
135 |
136 |
137 |
## Train Results
138 |
139 |
140 |
141 |
142 |
143 |
144 |
145 |
146 |
147 |
148 |
149 |
150 |
151 |
152 |
153 |
154 |
155 |
156 |
157 |
158 |
159 |
160 |
161 |
162 |
163 |
## Validation Results
164 |
165 |
166 |
167 |
|Accuracy Score|0.35|
168 |
|Precision (Macro)|0.35|
169 |
|Recall (Macro)|0.16|
170 |
171 |
172 |
173 |
|Accuracy Score|0.62|
174 |
|Precision (Macro)|0.60|
175 |
|Recall (Macro)|0.47|
176 |
177 |
178 |
179 |
|Accuracy Score|0.
180 |
181 |
|Recall (Macro)|0.
182 |
183 |
184 |
185 |
![alt text](https://media-exp1.licdn.com/dms/image/C4D0BAQHpfgjEyhtE1g/company-logo_200_200/0/1625210573748?e=1638403200&v=beta&t=toQNpiOlyim5Ja4f7Ejv8yKoCWifMsLWjkC7XnyXICI "Logo M47")
186 |
6 |
7 |
# Spanish News Classification Headlines
8 |
9 |
SNCH: this model was developed by [M47Labs](https://www.m47labs.com/es/) the goal is text classification, the base model use was [BETO](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased), however this model has not been fine-tuned on any dataset. The objective is to show the performance of this model when is used with the objective of inference without training at all.
10 |
11 |
12 |
## Dataset validation Sample
13 |
14 |
Dataset size : 1000
15 |
16 |
Columns: idTask,task content 1,idTag,tag.
17 |
18 |
|task content|tag|
19 |
20 |
|Alcalá de Guadaíra celebra la IV Semana de la Diversidad Sexual con acciones de sensibilización|sociedad|
21 |
|El Archipiélago Chinijo Graciplus se impone en el Trofeo Centro Comercial Rubicón|deportes|
22 |
|Un total de 39 personas padecen ELA actualmente en la provincia|sociedad|
23 |
|Eurocopa 2021 : Italia vence a Gales y pasa a octavos con su candidatura reforzada|deportes|
24 |
|Resolución de 10 de junio de 2021, del Ayuntamiento de Tarazona de La Mancha (Albacete), referente a la convocatoria para proveer una plaza.|sociedad|
25 |
|El primer ministro sueco pierde una moción de censura|politica|
26 |
|El dólar se dispara tras la reunión de la Fed|economia|
27 |
28 |
29 |
## Labels:
61 |
62 |
63 |
review_text = 'los vehiculos que esten esperando pasajaeros deberan estar apagados para reducir emisiones'
64 |
path = "M47Labs/spanish_news_classification_headlines_untrained"
65 |
tokenizer = AutoTokenizer.from_pretrained(path)
66 |
model = BertForSequenceClassification.from_pretrained(path)
67 |
74 |
75 |
76 |
77 |
```[{'label': 'medio_ambiente', 'score': 0.2834321384291023}]```
78 |
79 |
### Pytorch
80 |
84 |
from transformers import AutoTokenizer, BertForSequenceClassification,TextClassificationPipeline
85 |
from numpy import np
86 |
87 |
model_name = 'M47Labs/spanish_news_classification_headlines_untrained'
88 |
MAX_LEN = 32
89 |
90 |
119 |
```Review text: las emisiones estan bajando, debido a las medidas ambientales tomadas por el gobierno```
120 |
121 |
122 |
```Sentiment : opinion```
123 |
124 |
125 |
A more in depth example on how to use the model can be found in this colab notebook: https://colab.research.google.com/drive/1XsKea6oMyEckye2FePW_XN7Rf8v41Cw_?usp=sharing
134 |
* EPOCHS = 5
135 |
136 |
137 |
138 |
## Validation Results
139 |
140 |
|Full Dataset||
141 |
142 |
|Accuracy Score|0.362|
143 |
|Precision (Macro)|0.21|
144 |
|Recall (Macro)|0.22|
145 |
146 |
147 |
148 |
![alt text](https://media-exp1.licdn.com/dms/image/C4D0BAQHpfgjEyhtE1g/company-logo_200_200/0/1625210573748?e=1638403200&v=beta&t=toQNpiOlyim5Ja4f7Ejv8yKoCWifMsLWjkC7XnyXICI "Logo M47")