qanastek commited on
Commit
217277b
·
1 Parent(s): 956578e

Update README.md

Browse files
Files changed (3) hide show
  1. 15_epochs_run.log +67 -0
  2. README.md +278 -0
  3. predict.py +9 -0
15_epochs_run.log ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ precision recall f1-score support
2
+
3
+ alarm_query 0.9661 0.9037 0.9338 1734
4
+ alarm_remove 0.9484 0.9608 0.9545 1071
5
+ alarm_set 0.8611 0.9254 0.8921 2091
6
+ audio_volume_down 0.8657 0.9537 0.9075 561
7
+ audio_volume_mute 0.8608 0.9130 0.8861 1632
8
+ audio_volume_other 0.8684 0.5392 0.6653 306
9
+ audio_volume_up 0.7198 0.8446 0.7772 663
10
+ calendar_query 0.7555 0.8229 0.7878 6426
11
+ calendar_remove 0.8688 0.9441 0.9049 3417
12
+ calendar_set 0.9092 0.9014 0.9053 10659
13
+ cooking_query 0.0000 0.0000 0.0000 0
14
+ cooking_recipe 0.9282 0.8592 0.8924 3672
15
+ datetime_convert 0.8144 0.7686 0.7909 765
16
+ datetime_query 0.9152 0.9305 0.9228 4488
17
+ email_addcontact 0.6482 0.8431 0.7330 612
18
+ email_query 0.9629 0.9319 0.9472 6069
19
+ email_querycontact 0.6853 0.8032 0.7396 1326
20
+ email_sendemail 0.9530 0.9381 0.9455 5814
21
+ general_greet 0.1026 0.3922 0.1626 51
22
+ general_joke 0.9305 0.9123 0.9213 969
23
+ general_quirky 0.6984 0.5417 0.6102 8619
24
+ iot_cleaning 0.9590 0.9359 0.9473 1326
25
+ iot_coffee 0.9304 0.9749 0.9521 1836
26
+ iot_hue_lightchange 0.8794 0.9374 0.9075 1836
27
+ iot_hue_lightdim 0.8695 0.8711 0.8703 1071
28
+ iot_hue_lightoff 0.9440 0.9229 0.9334 2193
29
+ iot_hue_lighton 0.4545 0.5882 0.5128 153
30
+ iot_hue_lightup 0.9271 0.8315 0.8767 1377
31
+ iot_wemo_off 0.9615 0.8715 0.9143 918
32
+ iot_wemo_on 0.8455 0.7941 0.8190 510
33
+ lists_createoradd 0.8437 0.8356 0.8396 1989
34
+ lists_query 0.8918 0.8335 0.8617 2601
35
+ lists_remove 0.9536 0.8601 0.9044 2652
36
+ music_dislikeness 0.7725 0.7157 0.7430 204
37
+ music_likeness 0.8570 0.8159 0.8359 1836
38
+ music_query 0.8667 0.8050 0.8347 1785
39
+ music_settings 0.4024 0.3301 0.3627 306
40
+ news_query 0.8343 0.8657 0.8498 6324
41
+ play_audiobook 0.8172 0.8125 0.8149 2091
42
+ play_game 0.8666 0.8403 0.8532 1785
43
+ play_music 0.8683 0.8845 0.8763 8976
44
+ play_podcasts 0.8925 0.9125 0.9024 3213
45
+ play_radio 0.8260 0.8935 0.8585 3672
46
+ qa_currency 0.9459 0.9578 0.9518 1989
47
+ qa_definition 0.8638 0.8552 0.8595 2907
48
+ qa_factoid 0.7959 0.8178 0.8067 7191
49
+ qa_maths 0.8937 0.9302 0.9116 1275
50
+ qa_stock 0.7995 0.9412 0.8646 1326
51
+ recommendation_events 0.7646 0.7702 0.7674 2193
52
+ recommendation_locations 0.7489 0.8830 0.8104 1581
53
+ recommendation_movies 0.6907 0.7706 0.7285 1020
54
+ social_post 0.9623 0.9080 0.9344 4131
55
+ social_query 0.8104 0.7914 0.8008 1275
56
+ takeaway_order 0.7697 0.8458 0.8059 1122
57
+ takeaway_query 0.9059 0.8571 0.8808 1785
58
+ transport_query 0.8141 0.7559 0.7839 2601
59
+ transport_taxi 0.9222 0.9403 0.9312 1173
60
+ transport_ticket 0.9259 0.9384 0.9321 1785
61
+ transport_traffic 0.6919 0.9660 0.8063 765
62
+ weather_query 0.9387 0.9492 0.9439 7956
63
+
64
+ accuracy 0.8617 151674
65
+ macro avg 0.8162 0.8273 0.8178 151674
66
+ weighted avg 0.8639 0.8617 0.8613 151674
67
+
README.md CHANGED
@@ -1,3 +1,281 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: cc-by-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - Transformers
4
+ - text-classification
5
+ - intent-classification
6
+ - multi-class-classification
7
+ - natural-language-understanding
8
+ languages:
9
+ - af-ZA
10
+ - am-ET
11
+ - ar-SA
12
+ - az-AZ
13
+ - bn-BD
14
+ - cy-GB
15
+ - da-DK
16
+ - de-DE
17
+ - el-GR
18
+ - en-US
19
+ - es-ES
20
+ - fa-IR
21
+ - fi-FI
22
+ - fr-FR
23
+ - he-IL
24
+ - hi-IN
25
+ - hu-HU
26
+ - hy-AM
27
+ - id-ID
28
+ - is-IS
29
+ - it-IT
30
+ - ja-JP
31
+ - jv-ID
32
+ - ka-GE
33
+ - km-KH
34
+ - kn-IN
35
+ - ko-KR
36
+ - lv-LV
37
+ - ml-IN
38
+ - mn-MN
39
+ - ms-MY
40
+ - my-MM
41
+ - nb-NO
42
+ - nl-NL
43
+ - pl-PL
44
+ - pt-PT
45
+ - ro-RO
46
+ - ru-RU
47
+ - sl-SL
48
+ - sq-AL
49
+ - sv-SE
50
+ - sw-KE
51
+ - ta-IN
52
+ - te-IN
53
+ - th-TH
54
+ - tl-PH
55
+ - tr-TR
56
+ - ur-PK
57
+ - vi-VN
58
+ - zh-CN
59
+ - zh-TW
60
+ multilinguality:
61
+ - af-ZA
62
+ - am-ET
63
+ - ar-SA
64
+ - az-AZ
65
+ - bn-BD
66
+ - cy-GB
67
+ - da-DK
68
+ - de-DE
69
+ - el-GR
70
+ - en-US
71
+ - es-ES
72
+ - fa-IR
73
+ - fi-FI
74
+ - fr-FR
75
+ - he-IL
76
+ - hi-IN
77
+ - hu-HU
78
+ - hy-AM
79
+ - id-ID
80
+ - is-IS
81
+ - it-IT
82
+ - ja-JP
83
+ - jv-ID
84
+ - ka-GE
85
+ - km-KH
86
+ - kn-IN
87
+ - ko-KR
88
+ - lv-LV
89
+ - ml-IN
90
+ - mn-MN
91
+ - ms-MY
92
+ - my-MM
93
+ - nb-NO
94
+ - nl-NL
95
+ - pl-PL
96
+ - pt-PT
97
+ - ro-RO
98
+ - ru-RU
99
+ - sl-SL
100
+ - sq-AL
101
+ - sv-SE
102
+ - sw-KE
103
+ - ta-IN
104
+ - te-IN
105
+ - th-TH
106
+ - tl-PH
107
+ - tr-TR
108
+ - ur-PK
109
+ - vi-VN
110
+ - zh-CN
111
+ - zh-TW
112
+ datasets:
113
+ - qanastek/MASSIVE
114
+ widget:
115
+ - text: "réveille-moi à neuf heures du matin le vendredi"
116
  license: cc-by-4.0
117
  ---
118
+
119
+ **People Involved**
120
+
121
+ * [LABRAK Yanis](https://www.linkedin.com/in/yanis-labrak-8a7412145/) (1)
122
+
123
+ **Affiliations**
124
+
125
+ 1. [LIA, NLP team](https://lia.univ-avignon.fr/), Avignon University, Avignon, France.
126
+
127
+ ## Demo: How to use in HuggingFace Transformers
128
+
129
+ Requires [transformers](https://pypi.org/project/transformers/): ```pip install transformers```
130
+
131
+ ```python
132
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
133
+
134
+ model_name = 'qanastek/XLMRoberta-Alexa-Intents-Classification'
135
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
136
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
137
+ classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer)
138
+
139
+ res = classifier("réveille-moi à neuf heures du matin le vendredi")
140
+ print(res)
141
+ ```
142
+
143
+ ## Training data
144
+
145
+ [MASSIVE](https://huggingface.co/datasets/qanastek/MASSIVE) is a parallel dataset of > 1M utterances across 51 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the SLURP dataset, composed of general Intelligent Voice Assistant single-shot interactions.
146
+
147
+ ## Intents
148
+
149
+ ```plain
150
+ audio_volume_other
151
+ play_music
152
+ iot_hue_lighton
153
+ general_greet
154
+ calendar_set
155
+ audio_volume_down
156
+ social_query
157
+ audio_volume_mute
158
+ iot_wemo_on
159
+ iot_hue_lightup
160
+ audio_volume_up
161
+ iot_coffee
162
+ takeaway_query
163
+ qa_maths
164
+ play_game
165
+ cooking_query
166
+ iot_hue_lightdim
167
+ iot_wemo_off
168
+ music_settings
169
+ weather_query
170
+ news_query
171
+ alarm_remove
172
+ social_post
173
+ recommendation_events
174
+ transport_taxi
175
+ takeaway_order
176
+ music_query
177
+ calendar_query
178
+ lists_query
179
+ qa_currency
180
+ recommendation_movies
181
+ general_joke
182
+ recommendation_locations
183
+ email_querycontact
184
+ lists_remove
185
+ play_audiobook
186
+ email_addcontact
187
+ lists_createoradd
188
+ play_radio
189
+ qa_stock
190
+ alarm_query
191
+ email_sendemail
192
+ general_quirky
193
+ music_likeness
194
+ cooking_recipe
195
+ email_query
196
+ datetime_query
197
+ transport_traffic
198
+ play_podcasts
199
+ iot_hue_lightchange
200
+ calendar_remove
201
+ transport_query
202
+ transport_ticket
203
+ qa_factoid
204
+ iot_cleaning
205
+ alarm_set
206
+ datetime_convert
207
+ iot_hue_lightoff
208
+ qa_definition
209
+ music_dislikeness
210
+ ```
211
+
212
+ ## Evaluation results
213
+
214
+ ```plain
215
+ precision recall f1-score support
216
+
217
+ alarm_query 0.9661 0.9037 0.9338 1734
218
+ alarm_remove 0.9484 0.9608 0.9545 1071
219
+ alarm_set 0.8611 0.9254 0.8921 2091
220
+ audio_volume_down 0.8657 0.9537 0.9075 561
221
+ audio_volume_mute 0.8608 0.9130 0.8861 1632
222
+ audio_volume_other 0.8684 0.5392 0.6653 306
223
+ audio_volume_up 0.7198 0.8446 0.7772 663
224
+ calendar_query 0.7555 0.8229 0.7878 6426
225
+ calendar_remove 0.8688 0.9441 0.9049 3417
226
+ calendar_set 0.9092 0.9014 0.9053 10659
227
+ cooking_query 0.0000 0.0000 0.0000 0
228
+ cooking_recipe 0.9282 0.8592 0.8924 3672
229
+ datetime_convert 0.8144 0.7686 0.7909 765
230
+ datetime_query 0.9152 0.9305 0.9228 4488
231
+ email_addcontact 0.6482 0.8431 0.7330 612
232
+ email_query 0.9629 0.9319 0.9472 6069
233
+ email_querycontact 0.6853 0.8032 0.7396 1326
234
+ email_sendemail 0.9530 0.9381 0.9455 5814
235
+ general_greet 0.1026 0.3922 0.1626 51
236
+ general_joke 0.9305 0.9123 0.9213 969
237
+ general_quirky 0.6984 0.5417 0.6102 8619
238
+ iot_cleaning 0.9590 0.9359 0.9473 1326
239
+ iot_coffee 0.9304 0.9749 0.9521 1836
240
+ iot_hue_lightchange 0.8794 0.9374 0.9075 1836
241
+ iot_hue_lightdim 0.8695 0.8711 0.8703 1071
242
+ iot_hue_lightoff 0.9440 0.9229 0.9334 2193
243
+ iot_hue_lighton 0.4545 0.5882 0.5128 153
244
+ iot_hue_lightup 0.9271 0.8315 0.8767 1377
245
+ iot_wemo_off 0.9615 0.8715 0.9143 918
246
+ iot_wemo_on 0.8455 0.7941 0.8190 510
247
+ lists_createoradd 0.8437 0.8356 0.8396 1989
248
+ lists_query 0.8918 0.8335 0.8617 2601
249
+ lists_remove 0.9536 0.8601 0.9044 2652
250
+ music_dislikeness 0.7725 0.7157 0.7430 204
251
+ music_likeness 0.8570 0.8159 0.8359 1836
252
+ music_query 0.8667 0.8050 0.8347 1785
253
+ music_settings 0.4024 0.3301 0.3627 306
254
+ news_query 0.8343 0.8657 0.8498 6324
255
+ play_audiobook 0.8172 0.8125 0.8149 2091
256
+ play_game 0.8666 0.8403 0.8532 1785
257
+ play_music 0.8683 0.8845 0.8763 8976
258
+ play_podcasts 0.8925 0.9125 0.9024 3213
259
+ play_radio 0.8260 0.8935 0.8585 3672
260
+ qa_currency 0.9459 0.9578 0.9518 1989
261
+ qa_definition 0.8638 0.8552 0.8595 2907
262
+ qa_factoid 0.7959 0.8178 0.8067 7191
263
+ qa_maths 0.8937 0.9302 0.9116 1275
264
+ qa_stock 0.7995 0.9412 0.8646 1326
265
+ recommendation_events 0.7646 0.7702 0.7674 2193
266
+ recommendation_locations 0.7489 0.8830 0.8104 1581
267
+ recommendation_movies 0.6907 0.7706 0.7285 1020
268
+ social_post 0.9623 0.9080 0.9344 4131
269
+ social_query 0.8104 0.7914 0.8008 1275
270
+ takeaway_order 0.7697 0.8458 0.8059 1122
271
+ takeaway_query 0.9059 0.8571 0.8808 1785
272
+ transport_query 0.8141 0.7559 0.7839 2601
273
+ transport_taxi 0.9222 0.9403 0.9312 1173
274
+ transport_ticket 0.9259 0.9384 0.9321 1785
275
+ transport_traffic 0.6919 0.9660 0.8063 765
276
+ weather_query 0.9387 0.9492 0.9439 7956
277
+
278
+ accuracy 0.8617 151674
279
+ macro avg 0.8162 0.8273 0.8178 151674
280
+ weighted avg 0.8639 0.8617 0.8613 151674
281
+ ```
predict.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
2
+
3
+ model_name = 'qanastek/XLMRoberta-Alexa-Intents-Classification'
4
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
5
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
6
+ classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer)
7
+
8
+ res = classifier("réveille-moi à neuf heures du matin le vendredi")
9
+ print(res)