sdantonio commited on
Commit
715b9aa
·
verified ·
1 Parent(s): 0d87e96

Add BERTopic model

Browse files
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # BERTopic_astrosenmovimiento
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("sdantonio/BERTopic_astrosenmovimiento")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 6
34
+ * Number of training documents: 404
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | inenarrable - queridas - encuadrando - reiniciar - encapsulan | 16 | -1_inenarrable_queridas_encuadrando_reiniciar |
42
+ | 0 | genes - tardes - bellas - mercurio - venus | 6 | 0_genes_tardes_bellas_mercurio |
43
+ | 1 | adentro - venus - mercurio - escorpio - suen | 151 | 1_adentro_venus_mercurio_escorpio |
44
+ | 2 | bellas - venus - eclipse - comparto - pluto | 129 | 2_bellas_venus_eclipse_comparto |
45
+ | 3 | bellas - mercurio - escorpio - venus - comparto | 58 | 3_bellas_mercurio_escorpio_venus |
46
+ | 4 | historias - lxs - sinergia - bellas - venus | 44 | 4_historias_lxs_sinergia_bellas |
47
+
48
+ </details>
49
+
50
+ ## Training hyperparameters
51
+
52
+ * calculate_probabilities: False
53
+ * language: None
54
+ * low_memory: False
55
+ * min_topic_size: 10
56
+ * n_gram_range: (1, 1)
57
+ * nr_topics: None
58
+ * seed_topic_list: None
59
+ * top_n_words: 10
60
+ * verbose: False
61
+ * zeroshot_min_similarity: 0.7
62
+ * zeroshot_topic_list: None
63
+
64
+ ## Framework versions
65
+
66
+ * Numpy: 1.23.5
67
+ * HDBSCAN: 0.8.38.post1
68
+ * UMAP: 0.5.6
69
+ * Pandas: 2.2.2
70
+ * Scikit-Learn: 1.5.1
71
+ * Sentence-transformers: 3.0.1
72
+ * Transformers: 4.44.2
73
+ * Numba: 0.60.0
74
+ * Plotly: 5.24.0
75
+ * Python: 3.10.12
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": false,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 1
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": false,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null
16
+ }
ctfidf.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59a85f8d000559124f24c0820f4736fffec61dada01e0608f80ed6acea679638
3
+ size 216020
ctfidf_config.json ADDED
The diff for this file is too large to render. See raw diff
 
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6522b9f386dc44968d8b5e671b2ffb829c0dbb679d1d5599c05411ab3f9be9f3
3
+ size 24664
topics.json ADDED
@@ -0,0 +1,713 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "topic_representations": {
3
+ "-1": [
4
+ [
5
+ "inenarrable",
6
+ 0.9928661584854126
7
+ ],
8
+ [
9
+ "queridas",
10
+ 0.9925734400749207
11
+ ],
12
+ [
13
+ "encuadrando",
14
+ 0.9918133020401001
15
+ ],
16
+ [
17
+ "reiniciar",
18
+ 0.9914272427558899
19
+ ],
20
+ [
21
+ "encapsulan",
22
+ 0.9913591742515564
23
+ ],
24
+ [
25
+ "justxs",
26
+ 0.9912671446800232
27
+ ],
28
+ [
29
+ "comparto",
30
+ 0.9909676313400269
31
+ ],
32
+ [
33
+ "contemplarlas",
34
+ 0.9907957911491394
35
+ ],
36
+ [
37
+ "mentoras",
38
+ 0.9907957315444946
39
+ ],
40
+ [
41
+ "aislada",
42
+ 0.9907400608062744
43
+ ]
44
+ ],
45
+ "0": [
46
+ [
47
+ "genes",
48
+ 0.9928173422813416
49
+ ],
50
+ [
51
+ "tardes",
52
+ 0.9923921227455139
53
+ ],
54
+ [
55
+ "bellas",
56
+ 0.9923911094665527
57
+ ],
58
+ [
59
+ "mercurio",
60
+ 0.992171585559845
61
+ ],
62
+ [
63
+ "venus",
64
+ 0.9920682311058044
65
+ ],
66
+ [
67
+ "comparto",
68
+ 0.9919589161872864
69
+ ],
70
+ [
71
+ "pluto",
72
+ 0.9919396042823792
73
+ ],
74
+ [
75
+ "preludio",
76
+ 0.9918721914291382
77
+ ],
78
+ [
79
+ "marte",
80
+ 0.9917718768119812
81
+ ],
82
+ [
83
+ "peri",
84
+ 0.9917409420013428
85
+ ]
86
+ ],
87
+ "1": [
88
+ [
89
+ "adentro",
90
+ 0.9881443977355957
91
+ ],
92
+ [
93
+ "venus",
94
+ 0.98813796043396
95
+ ],
96
+ [
97
+ "mercurio",
98
+ 0.9880510568618774
99
+ ],
100
+ [
101
+ "escorpio",
102
+ 0.9879761934280396
103
+ ],
104
+ [
105
+ "suen",
106
+ 0.9875411987304688
107
+ ],
108
+ [
109
+ "marte",
110
+ 0.9873085618019104
111
+ ],
112
+ [
113
+ "conjuncio",
114
+ 0.9868541359901428
115
+ ],
116
+ [
117
+ "consciencia",
118
+ 0.9868273735046387
119
+ ],
120
+ [
121
+ "capricornio",
122
+ 0.9862940907478333
123
+ ],
124
+ [
125
+ "ndose",
126
+ 0.9861818552017212
127
+ ]
128
+ ],
129
+ "2": [
130
+ [
131
+ "bellas",
132
+ 0.9899139404296875
133
+ ],
134
+ [
135
+ "venus",
136
+ 0.9897359609603882
137
+ ],
138
+ [
139
+ "eclipse",
140
+ 0.9895275235176086
141
+ ],
142
+ [
143
+ "comparto",
144
+ 0.9893816113471985
145
+ ],
146
+ [
147
+ "pluto",
148
+ 0.9892137050628662
149
+ ],
150
+ [
151
+ "ciclaje",
152
+ 0.9891499876976013
153
+ ],
154
+ [
155
+ "marte",
156
+ 0.9889230728149414
157
+ ],
158
+ [
159
+ "plenilunio",
160
+ 0.9888471961021423
161
+ ],
162
+ [
163
+ "aries",
164
+ 0.9887511730194092
165
+ ],
166
+ [
167
+ "mengua",
168
+ 0.9885473251342773
169
+ ]
170
+ ],
171
+ "3": [
172
+ [
173
+ "bellas",
174
+ 0.9917415380477905
175
+ ],
176
+ [
177
+ "mercurio",
178
+ 0.9916486740112305
179
+ ],
180
+ [
181
+ "escorpio",
182
+ 0.9915724396705627
183
+ ],
184
+ [
185
+ "venus",
186
+ 0.9915477633476257
187
+ ],
188
+ [
189
+ "comparto",
190
+ 0.9913554787635803
191
+ ],
192
+ [
193
+ "pluto",
194
+ 0.9912938475608826
195
+ ],
196
+ [
197
+ "ciclaje",
198
+ 0.9911738634109497
199
+ ],
200
+ [
201
+ "ando",
202
+ 0.9911345839500427
203
+ ],
204
+ [
205
+ "marte",
206
+ 0.9910805225372314
207
+ ],
208
+ [
209
+ "suen",
210
+ 0.9909937977790833
211
+ ]
212
+ ],
213
+ "4": [
214
+ [
215
+ "historias",
216
+ 0.9913379549980164
217
+ ],
218
+ [
219
+ "lxs",
220
+ 0.9908130168914795
221
+ ],
222
+ [
223
+ "sinergia",
224
+ 0.9907007217407227
225
+ ],
226
+ [
227
+ "bellas",
228
+ 0.990537166595459
229
+ ],
230
+ [
231
+ "venus",
232
+ 0.9902072548866272
233
+ ],
234
+ [
235
+ "ltimos",
236
+ 0.9901100993156433
237
+ ],
238
+ [
239
+ "cuestio",
240
+ 0.9900380969047546
241
+ ],
242
+ [
243
+ "levanto",
244
+ 0.990032434463501
245
+ ],
246
+ [
247
+ "ciclaje",
248
+ 0.9899183511734009
249
+ ],
250
+ [
251
+ "deo",
252
+ 0.9897480607032776
253
+ ]
254
+ ]
255
+ },
256
+ "topics": [
257
+ 2,
258
+ 0,
259
+ 3,
260
+ 1,
261
+ 1,
262
+ 0,
263
+ 3,
264
+ 3,
265
+ 0,
266
+ 1,
267
+ 2,
268
+ 0,
269
+ 0,
270
+ 1,
271
+ 0,
272
+ 0,
273
+ 1,
274
+ 1,
275
+ 2,
276
+ 1,
277
+ 0,
278
+ 0,
279
+ 1,
280
+ 0,
281
+ 2,
282
+ 0,
283
+ 0,
284
+ 3,
285
+ 1,
286
+ 3,
287
+ 0,
288
+ 1,
289
+ 2,
290
+ 4,
291
+ 3,
292
+ 1,
293
+ 0,
294
+ 0,
295
+ 1,
296
+ 1,
297
+ 4,
298
+ 1,
299
+ 3,
300
+ 2,
301
+ 1,
302
+ 2,
303
+ 1,
304
+ 0,
305
+ 2,
306
+ -1,
307
+ 1,
308
+ 4,
309
+ 1,
310
+ 1,
311
+ 0,
312
+ 0,
313
+ 0,
314
+ 1,
315
+ 2,
316
+ 1,
317
+ 0,
318
+ 1,
319
+ 1,
320
+ 0,
321
+ 3,
322
+ 0,
323
+ 0,
324
+ 1,
325
+ 3,
326
+ 2,
327
+ 1,
328
+ 1,
329
+ 0,
330
+ 2,
331
+ 0,
332
+ 0,
333
+ 1,
334
+ 0,
335
+ 0,
336
+ 2,
337
+ 0,
338
+ 4,
339
+ 1,
340
+ 1,
341
+ 1,
342
+ 0,
343
+ 0,
344
+ 3,
345
+ 3,
346
+ 1,
347
+ 0,
348
+ 1,
349
+ 0,
350
+ 0,
351
+ 4,
352
+ 2,
353
+ 0,
354
+ 0,
355
+ 1,
356
+ 0,
357
+ 0,
358
+ 1,
359
+ 1,
360
+ 2,
361
+ 0,
362
+ 0,
363
+ 1,
364
+ 1,
365
+ 3,
366
+ 0,
367
+ 2,
368
+ 2,
369
+ 2,
370
+ 0,
371
+ 1,
372
+ 0,
373
+ 3,
374
+ 3,
375
+ 0,
376
+ 3,
377
+ 2,
378
+ 2,
379
+ 1,
380
+ 0,
381
+ 0,
382
+ 2,
383
+ 0,
384
+ 0,
385
+ 3,
386
+ 0,
387
+ 1,
388
+ 0,
389
+ 0,
390
+ 0,
391
+ 1,
392
+ 0,
393
+ 1,
394
+ 2,
395
+ 0,
396
+ 0,
397
+ 0,
398
+ 1,
399
+ 1,
400
+ 3,
401
+ 0,
402
+ 0,
403
+ 1,
404
+ 1,
405
+ 0,
406
+ 3,
407
+ 2,
408
+ 2,
409
+ 2,
410
+ 1,
411
+ 3,
412
+ 0,
413
+ 0,
414
+ 1,
415
+ 3,
416
+ 0,
417
+ 1,
418
+ 1,
419
+ 3,
420
+ 1,
421
+ 1,
422
+ 3,
423
+ 3,
424
+ 0,
425
+ 0,
426
+ 4,
427
+ 1,
428
+ 1,
429
+ 1,
430
+ 1,
431
+ 2,
432
+ 2,
433
+ 0,
434
+ 3,
435
+ 3,
436
+ 0,
437
+ 3,
438
+ 0,
439
+ 2,
440
+ 4,
441
+ 0,
442
+ 0,
443
+ 0,
444
+ 2,
445
+ 1,
446
+ 1,
447
+ 1,
448
+ 0,
449
+ 0,
450
+ 1,
451
+ 2,
452
+ 1,
453
+ 1,
454
+ 0,
455
+ 0,
456
+ 0,
457
+ 1,
458
+ 2,
459
+ 4,
460
+ 1,
461
+ 0,
462
+ 0,
463
+ 0,
464
+ 0,
465
+ 1,
466
+ -1,
467
+ 0,
468
+ 1,
469
+ 0,
470
+ 1,
471
+ 1,
472
+ 0,
473
+ 1,
474
+ 0,
475
+ 0,
476
+ 1,
477
+ 1,
478
+ 0,
479
+ 0,
480
+ 0,
481
+ 1,
482
+ 1,
483
+ 1,
484
+ 0,
485
+ 2,
486
+ 2,
487
+ 0,
488
+ 2,
489
+ 1,
490
+ 1,
491
+ 2,
492
+ 0,
493
+ 3,
494
+ 0,
495
+ 0,
496
+ 1,
497
+ 0,
498
+ 2,
499
+ 2,
500
+ 0,
501
+ -1,
502
+ 1,
503
+ 0,
504
+ 0,
505
+ 1,
506
+ 0,
507
+ 1,
508
+ 0,
509
+ 1,
510
+ 0,
511
+ 1,
512
+ 4,
513
+ 1,
514
+ 2,
515
+ 2,
516
+ 3,
517
+ 0,
518
+ 2,
519
+ 1,
520
+ 0,
521
+ 3,
522
+ 1,
523
+ 0,
524
+ 4,
525
+ 1,
526
+ 2,
527
+ 2,
528
+ 4,
529
+ 1,
530
+ 0,
531
+ 0,
532
+ 1,
533
+ 1,
534
+ 0,
535
+ 3,
536
+ 4,
537
+ 1,
538
+ 1,
539
+ 0,
540
+ -1,
541
+ 0,
542
+ 0,
543
+ 1,
544
+ 2,
545
+ 0,
546
+ 0,
547
+ 0,
548
+ 4,
549
+ 1,
550
+ 1,
551
+ 1,
552
+ 1,
553
+ 1,
554
+ 1,
555
+ 0,
556
+ 0,
557
+ 3,
558
+ 2,
559
+ 0,
560
+ 1,
561
+ 0,
562
+ 0,
563
+ 1,
564
+ 3,
565
+ 0,
566
+ -1,
567
+ 4,
568
+ 1,
569
+ 1,
570
+ 1,
571
+ 3,
572
+ 2,
573
+ 2,
574
+ 2,
575
+ 2,
576
+ 1,
577
+ 1,
578
+ 0,
579
+ 0,
580
+ 3,
581
+ 3,
582
+ 2,
583
+ 0,
584
+ 0,
585
+ 0,
586
+ 0,
587
+ 1,
588
+ 2,
589
+ 0,
590
+ 0,
591
+ 0,
592
+ 0,
593
+ 3,
594
+ 3,
595
+ 1,
596
+ 1,
597
+ 1,
598
+ 2,
599
+ 0,
600
+ 0,
601
+ 0,
602
+ 0,
603
+ 0,
604
+ 1,
605
+ 3,
606
+ 3,
607
+ 0,
608
+ 2,
609
+ 0,
610
+ 3,
611
+ 0,
612
+ 1,
613
+ 3,
614
+ 3,
615
+ 0,
616
+ 2,
617
+ 0,
618
+ 0,
619
+ 3,
620
+ 0,
621
+ 0,
622
+ 0,
623
+ 3,
624
+ 1,
625
+ 2,
626
+ 1,
627
+ 1,
628
+ 0,
629
+ 0,
630
+ 1,
631
+ 1,
632
+ 1,
633
+ 1,
634
+ 1,
635
+ 0,
636
+ 1,
637
+ 0,
638
+ 1,
639
+ 4,
640
+ 1,
641
+ 1,
642
+ -1,
643
+ 1,
644
+ 0,
645
+ 0,
646
+ 2,
647
+ 2,
648
+ 1,
649
+ 1,
650
+ 4,
651
+ 1,
652
+ 2,
653
+ 2,
654
+ 2,
655
+ 0,
656
+ 0,
657
+ 1,
658
+ 0,
659
+ 0,
660
+ 1
661
+ ],
662
+ "topic_sizes": {
663
+ "2": 58,
664
+ "0": 151,
665
+ "3": 44,
666
+ "1": 129,
667
+ "4": 16,
668
+ "-1": 6
669
+ },
670
+ "topic_mapper": [
671
+ [
672
+ -1,
673
+ -1,
674
+ -1
675
+ ],
676
+ [
677
+ 0,
678
+ 0,
679
+ 1
680
+ ],
681
+ [
682
+ 1,
683
+ 1,
684
+ 0
685
+ ],
686
+ [
687
+ 2,
688
+ 2,
689
+ 4
690
+ ],
691
+ [
692
+ 3,
693
+ 3,
694
+ 3
695
+ ],
696
+ [
697
+ 4,
698
+ 4,
699
+ 2
700
+ ]
701
+ ],
702
+ "topic_labels": {
703
+ "-1": "-1_inenarrable_queridas_encuadrando_reiniciar",
704
+ "0": "0_genes_tardes_bellas_mercurio",
705
+ "1": "1_adentro_venus_mercurio_escorpio",
706
+ "2": "2_bellas_venus_eclipse_comparto",
707
+ "3": "3_bellas_mercurio_escorpio_venus",
708
+ "4": "4_historias_lxs_sinergia_bellas"
709
+ },
710
+ "custom_labels": null,
711
+ "_outliers": 1,
712
+ "topic_aspects": {}
713
+ }