bal_arxiv_scientific_abstract_berttopic_model

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("Rchamba/bal_arxiv_scientific_abstract_berttopic_model")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 15
  • Number of training documents: 360
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 data - secret - steganography - algorithm - manipulation 12 -1_data_secret_steganography_algorithm
0 sp - intelligence - model - processing - theory 50 0_sp_intelligence_model_processing
1 quantum - automata - classical - finite - measurement 44 1_quantum_automata_classical_finite
2 logic - computability - cl - edu - www 35 2_logic_computability_cl_edu
3 tetraquark - bar - vector - rm - qcd 25 3_tetraquark_bar_vector_rm
4 problems - problem - design - combinatorial - clustering 23 4_problems_problem_design_combinatorial
5 prediction - probability - sequence - model - universal 23 5_prediction_probability_sequence_model
6 notes - informal - spaces - fourier - basic 22 6_notes_informal_spaces_fourier
7 citation - science - journals - social - analysis 22 7_citation_science_journals_social
8 orbital - earth - gravitational - artificial - effects 22 8_orbital_earth_gravitational_artificial
9 keyphrases - word - algorithm - similarity - semantic 20 9_keyphrases_word_algorithm_similarity
10 kernel - gmm - datasets - kernels - classification 18 10_kernel_gmm_datasets_kernels
11 problems - csps - constraints - fuzzy - counting 17 11_problems_csps_constraints_fuzzy
12 data - ultrametric - ultrametricity - analysis - structure 14 12_data_ultrametric_ultrametricity_analysis
13 image - vision - processing - content - cognitive 13 13_image_vision_processing_content

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 2.0.2
  • HDBSCAN: 0.8.40
  • UMAP: 0.5.7
  • Pandas: 2.2.2
  • Scikit-Learn: 1.6.1
  • Sentence-transformers: 4.1.0
  • Transformers: 4.52.4
  • Numba: 0.60.0
  • Plotly: 5.24.1
  • Python: 3.11.13
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support