bal_arxiv_scientific_abstract_berttopic_model

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("Rchamba/bal_arxiv_scientific_abstract_berttopic_model")

topic_model.get_topic_info()

Topic overview

Number of topics: 15
Number of training documents: 360

Click here for an overview of all topics.

Topic ID	Topic Keywords	Topic Frequency	Label
-1	data - secret - steganography - algorithm - manipulation	12	-1_data_secret_steganography_algorithm
0	sp - intelligence - model - processing - theory	50	0_sp_intelligence_model_processing
1	quantum - automata - classical - finite - measurement	44	1_quantum_automata_classical_finite
2	logic - computability - cl - edu - www	35	2_logic_computability_cl_edu
3	tetraquark - bar - vector - rm - qcd	25	3_tetraquark_bar_vector_rm
4	problems - problem - design - combinatorial - clustering	23	4_problems_problem_design_combinatorial
5	prediction - probability - sequence - model - universal	23	5_prediction_probability_sequence_model
6	notes - informal - spaces - fourier - basic	22	6_notes_informal_spaces_fourier
7	citation - science - journals - social - analysis	22	7_citation_science_journals_social
8	orbital - earth - gravitational - artificial - effects	22	8_orbital_earth_gravitational_artificial
9	keyphrases - word - algorithm - similarity - semantic	20	9_keyphrases_word_algorithm_similarity
10	kernel - gmm - datasets - kernels - classification	18	10_kernel_gmm_datasets_kernels
11	problems - csps - constraints - fuzzy - counting	17	11_problems_csps_constraints_fuzzy
12	data - ultrametric - ultrametricity - analysis - structure	14	12_data_ultrametric_ultrametricity_analysis
13	image - vision - processing - content - cognitive	13	13_image_vision_processing_content

Training hyperparameters

calculate_probabilities: True
language: english
low_memory: False
min_topic_size: 10
n_gram_range: (1, 1)
nr_topics: None
seed_topic_list: None
top_n_words: 10
verbose: False
zeroshot_min_similarity: 0.7
zeroshot_topic_list: None

Framework versions

Numpy: 2.0.2
HDBSCAN: 0.8.40
UMAP: 0.5.7
Pandas: 2.2.2
Scikit-Learn: 1.6.1
Sentence-transformers: 4.1.0
Transformers: 4.52.4
Numba: 0.60.0
Plotly: 5.24.1
Python: 3.11.13