DanielCano
commited on
Commit
·
cc1bbd9
1
Parent(s):
4217bf2
Update README.md
Browse files
README.md
CHANGED
@@ -6,24 +6,24 @@ widget:
|
|
6 |
|
7 |
# Spanish News Classification Headlines
|
8 |
|
9 |
-
SNCH: this model was
|
10 |
|
11 |
|
12 |
-
## Dataset Sample
|
13 |
|
14 |
Dataset size : 1000
|
15 |
|
16 |
Columns: idTask,task content 1,idTag,tag.
|
17 |
|
18 |
-
|
|
19 |
-
|
20 |
-
|
|
21 |
-
|
|
22 |
-
|
|
23 |
-
|
|
24 |
-
|
|
25 |
-
|
|
26 |
-
|
|
27 |
|
28 |
|
29 |
## Labels:
|
@@ -61,7 +61,7 @@ from transformers import AutoTokenizer, BertForSequenceClassification,TextClassi
|
|
61 |
|
62 |
|
63 |
review_text = 'los vehiculos que esten esperando pasajaeros deberan estar apagados para reducir emisiones'
|
64 |
-
path = "M47Labs/
|
65 |
tokenizer = AutoTokenizer.from_pretrained(path)
|
66 |
model = BertForSequenceClassification.from_pretrained(path)
|
67 |
|
@@ -74,7 +74,7 @@ print(nlp(review_text))
|
|
74 |
|
75 |
```
|
76 |
|
77 |
-
```[{'label': 'medio_ambiente', 'score': 0.
|
78 |
|
79 |
### Pytorch
|
80 |
|
@@ -84,7 +84,7 @@ import torch
|
|
84 |
from transformers import AutoTokenizer, BertForSequenceClassification,TextClassificationPipeline
|
85 |
from numpy import np
|
86 |
|
87 |
-
model_name = 'M47Labs/
|
88 |
MAX_LEN = 32
|
89 |
|
90 |
|
@@ -119,7 +119,7 @@ print(f'Sentiment : {model.config.id2label[prediction.detach().cpu().numpy()[0]
|
|
119 |
```Review text: las emisiones estan bajando, debido a las medidas ambientales tomadas por el gobierno```
|
120 |
|
121 |
|
122 |
-
```Sentiment :
|
123 |
|
124 |
|
125 |
A more in depth example on how to use the model can be found in this colab notebook: https://colab.research.google.com/drive/1XsKea6oMyEckye2FePW_XN7Rf8v41Cw_?usp=sharing
|
@@ -134,53 +134,15 @@ A more in depth example on how to use the model can be found in this colab noteb
|
|
134 |
* EPOCHS = 5
|
135 |
* LEARNING_RATE = 1e-05
|
136 |
|
137 |
-
## Train Results
|
138 |
-
|
139 |
-
|n_example|epoch|loss|acc|
|
140 |
-
|------|------|------|------|
|
141 |
-
|100|0|2.286327266693115|12.5|
|
142 |
-
|100|1|2.018876111507416|40.0|
|
143 |
-
|100|2|1.8016730904579163|43.75|
|
144 |
-
|100|3|1.6121837735176086|46.25|
|
145 |
-
|100|4|1.41565443277359|68.75|
|
146 |
-
|
147 |
-
|n_example|epoch|loss|acc|
|
148 |
-
|------|------|------|------|
|
149 |
-
|500|0|2.0770938420295715|24.5|
|
150 |
-
|500|1|1.6953029704093934|50.25|
|
151 |
-
|500|2|1.258900796175003|64.25|
|
152 |
-
|500|3|0.8342628020048142|78.25|
|
153 |
-
|500|4|0.5135736921429634|90.25|
|
154 |
-
|
155 |
-
|n_example|epoch|loss|acc|
|
156 |
-
|------|------|------|------|
|
157 |
-
|1000|0|1.916002897115854|36.1997226074896|
|
158 |
-
|1000|1|1.2941598492664295|62.2746185852982|
|
159 |
-
|1000|2|0.8201534710415117|76.97642163661581|
|
160 |
-
|1000|3|0.524806430051615|86.9625520110957|
|
161 |
-
|1000|4|0.30662027455784463|92.64909847434119|
|
162 |
|
163 |
## Validation Results
|
164 |
|
165 |
-
|
|
166 |
-
|------|------|
|
167 |
-
|Accuracy Score|0.35|
|
168 |
-
|Precision (Macro)|0.35|
|
169 |
-
|Recall (Macro)|0.16|
|
170 |
-
|
171 |
-
|n_examples|500|
|
172 |
-
|------|------|
|
173 |
-
|Accuracy Score|0.62|
|
174 |
-
|Precision (Macro)|0.60|
|
175 |
-
|Recall (Macro)|0.47|
|
176 |
-
|
177 |
-
|n_examples|1000|
|
178 |
|------|------|
|
179 |
-
|Accuracy Score|0.
|
180 |
-
|Precision(Macro)|0.
|
181 |
-
|Recall (Macro)|0.
|
182 |
|
183 |
|
184 |
|
185 |
![alt text](https://media-exp1.licdn.com/dms/image/C4D0BAQHpfgjEyhtE1g/company-logo_200_200/0/1625210573748?e=1638403200&v=beta&t=toQNpiOlyim5Ja4f7Ejv8yKoCWifMsLWjkC7XnyXICI "Logo M47")
|
186 |
-
|
|
|
6 |
|
7 |
# Spanish News Classification Headlines
|
8 |
|
9 |
+
SNCH: this model was developed by [M47Labs](https://www.m47labs.com/es/) the goal is text classification, the base model use was [BETO](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased), however this model has not been fine-tuned on any dataset. The objective is to show the performance of this model when is used with the objective of inference without training at all.
|
10 |
|
11 |
|
12 |
+
## Dataset validation Sample
|
13 |
|
14 |
Dataset size : 1000
|
15 |
|
16 |
Columns: idTask,task content 1,idTag,tag.
|
17 |
|
18 |
+
|task content|tag|
|
19 |
+
|------|------|
|
20 |
+
|Alcalá de Guadaíra celebra la IV Semana de la Diversidad Sexual con acciones de sensibilización|sociedad|
|
21 |
+
|El Archipiélago Chinijo Graciplus se impone en el Trofeo Centro Comercial Rubicón|deportes|
|
22 |
+
|Un total de 39 personas padecen ELA actualmente en la provincia|sociedad|
|
23 |
+
|Eurocopa 2021 : Italia vence a Gales y pasa a octavos con su candidatura reforzada|deportes|
|
24 |
+
|Resolución de 10 de junio de 2021, del Ayuntamiento de Tarazona de La Mancha (Albacete), referente a la convocatoria para proveer una plaza.|sociedad|
|
25 |
+
|El primer ministro sueco pierde una moción de censura|politica|
|
26 |
+
|El dólar se dispara tras la reunión de la Fed|economia|
|
27 |
|
28 |
|
29 |
## Labels:
|
|
|
61 |
|
62 |
|
63 |
review_text = 'los vehiculos que esten esperando pasajaeros deberan estar apagados para reducir emisiones'
|
64 |
+
path = "M47Labs/spanish_news_classification_headlines_untrained"
|
65 |
tokenizer = AutoTokenizer.from_pretrained(path)
|
66 |
model = BertForSequenceClassification.from_pretrained(path)
|
67 |
|
|
|
74 |
|
75 |
```
|
76 |
|
77 |
+
```[{'label': 'medio_ambiente', 'score': 0.2834321384291023}]```
|
78 |
|
79 |
### Pytorch
|
80 |
|
|
|
84 |
from transformers import AutoTokenizer, BertForSequenceClassification,TextClassificationPipeline
|
85 |
from numpy import np
|
86 |
|
87 |
+
model_name = 'M47Labs/spanish_news_classification_headlines_untrained'
|
88 |
MAX_LEN = 32
|
89 |
|
90 |
|
|
|
119 |
```Review text: las emisiones estan bajando, debido a las medidas ambientales tomadas por el gobierno```
|
120 |
|
121 |
|
122 |
+
```Sentiment : opinion```
|
123 |
|
124 |
|
125 |
A more in depth example on how to use the model can be found in this colab notebook: https://colab.research.google.com/drive/1XsKea6oMyEckye2FePW_XN7Rf8v41Cw_?usp=sharing
|
|
|
134 |
* EPOCHS = 5
|
135 |
* LEARNING_RATE = 1e-05
|
136 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
137 |
|
138 |
## Validation Results
|
139 |
|
140 |
+
|Full Dataset||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
141 |
|------|------|
|
142 |
+
|Accuracy Score|0.362|
|
143 |
+
|Precision (Macro)|0.21|
|
144 |
+
|Recall (Macro)|0.22|
|
145 |
|
146 |
|
147 |
|
148 |
![alt text](https://media-exp1.licdn.com/dms/image/C4D0BAQHpfgjEyhtE1g/company-logo_200_200/0/1625210573748?e=1638403200&v=beta&t=toQNpiOlyim5Ja4f7Ejv8yKoCWifMsLWjkC7XnyXICI "Logo M47")
|
|