Edit model card

xsum_6789_5000000_2500000_v1_50topics_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_6789_5000000_2500000_v1_50topics_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 50
  • Number of training documents: 204045
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - mr - would - one - year 6 -1_said_mr_would_one
0 win - game - half - second - right 120934 0_win_game_half_second
1 said - would - labour - eu - party 18917 1_said_would_labour_eu
2 said - police - mr - court - would 14269 2_said_police_mr_court
3 mr - president - said - government - us 12871 3_mr_president_said_government
4 bank - company - sale - said - year 9344 4_bank_company_sale_said
5 fire - said - flood - people - water 2912 5_fire_said_flood_people
6 airport - flight - space - plane - aircraft 2642 6_airport_flight_space_plane
7 trump - mr - us - clinton - said 2601 7_trump_mr_us_clinton
8 energy - oil - rate - growth - price 2301 8_energy_oil_rate_growth
9 sea - water - ship - said - coastguard 2183 9_sea_water_ship_said
10 dog - animal - bird - said - zoo 1899 10_dog_animal_bird_said
11 president - fifa - mr - government - cuba 1707 11_president_fifa_mr_government
12 film - bbc - actor - star - show 1665 12_film_bbc_actor_star
13 drug - alcohol - smoking - pollution - health 1340 13_drug_alcohol_smoking_pollution
14 music - album - song - band - chart 1193 14_music_album_song_band
15 virus - ebola - health - infection - vaccine 827 15_virus_ebola_health_infection
16 yn - ar - wedi - ei - bod 765 16_yn_ar_wedi_ei
17 robot - car - research - technology - science 560 17_robot_car_research_technology
18 india - indian - delhi - indias - modi 469 18_india_indian_delhi_indias
19 found - museum - site - fossil - roman 431 19_found_museum_site_fossil
20 church - bishop - pope - vatican - cardinal 429 20_church_bishop_pope_vatican
21 australia - australian - mr - nauru - prime 381 21_australia_australian_mr_nauru
22 unsupported - updated - playback - device - media 342 22_unsupported_updated_playback_device
23 prince - royal - queen - duchess - duke 293 23_prince_royal_queen_duchess
24 marriage - gay - law - samesex - transgender 266 24_marriage_gay_law_samesex
25 updated - bst - gmt - last - 2017 235 25_updated_bst_gmt_last
26 food - product - meat - horsemeat - cheese 232 26_food_product_meat_horsemeat
27 woman - fashion - dress - women - wear 209 27_woman_fashion_dress_women
28 book - novel - author - prize - writer 202 28_book_novel_author_prize
29 mountain - avalanche - climber - everest - snow 201 29_mountain_avalanche_climber_everest
30 trident - defence - submarine - nuclear - army 180 30_trident_defence_submarine_nuclear
31 christmas - bell - cake - poppy - minster 152 31_christmas_bell_cake_poppy
32 cosby - clown - constand - mr - cosbys 144 32_cosby_clown_constand_mr
33 water - utilities - customer - company - said 125 33_water_utilities_customer_company
34 sleep - suicide - life - people - health 118 34_sleep_suicide_life_people
35 flag - confederate - white - university - statue 115 35_flag_confederate_white_university
36 nba - warriors - lakers - cavaliers - game 109 36_nba_warriors_lakers_cavaliers
37 picture - scotlandpicturesbbccouk - bbcscotlandpics - photo - selection 98 37_picture_scotlandpicturesbbccouk_bbcscotlandpics_photo
38 flag - emojis - deaf - emoji - language 89 38_flag_emojis_deaf_emoji
39 ring - diamond - jewellery - carat - jewel 65 39_ring_diamond_jewellery_carat
40 pokemon - game - go - ai - player 49 40_pokemon_game_go_ai
41 takata - airbags - recall - honda - airbag 37 41_takata_airbags_recall_honda
42 follow - - - - 35 42_follow___
43 depp - joyce - dog - boo - pistol 29 43_depp_joyce_dog_boo
44 leaguebyleague - list - managerial - below - appear 26 44_leaguebyleague_list_managerial_below
45 name - top - girls - boys - boy 17 45_name_top_girls_boys
46 film - potter - scotland - beasts - grindelwald 15 46_film_potter_scotland_beasts
47 balcony - irish - berkeley - student - donohoe 10 47_balcony_irish_berkeley_student
48 flower - garden - flowered - botanic - arum 6 48_flower_garden_flowered_botanic

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 50
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
0
Inference Examples
Inference API (serverless) is not available, repository is disabled.