SentenceTransformer based on sentence-transformers/allenai-specter

This is a sentence-transformers model finetuned from sentence-transformers/allenai-specter. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/allenai-specter
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("nadrajak/allenai-specter-ft2")
# Run inference
sentences = [
    "We describe azimuth structure commonly associated with elliptic and directed flow in the context of 2D angular autocorrelations for the purpose of precise separation of so-called nonflow (mainly minijets) from flow. We extend the Fourier-transform description of azimuth structure to include power spectra and autocorrelations related by the Wiener-Khintchine theorem. We analyze several examples of conventional flow analysis in that context and question the relevance of reaction plane estimation to flow analysis. We introduce the 2D angular autocorrelation with examples from data analysis and describe a simulation exercise which demonstrates precise separation of flow and nonflow using the 2D autocorrelation method. We show that an alternative correlation measure based on Pearson's normalized covariance provides a more intuitive measure of azimuth structure.",
    'It is a brief review on composing and solving Infrared Evolution Equations. They can be used in order to calculate amplitudes of high-energy reactions in different kinematic regions in the double-logarithmic approximation.',
    'Moeller\'s energy-momentum complex is employed in order to determine the energy and momentum distributions for a spacetime described by a "generalized Schwarzschild" geometry in (3+1)-dimensions on a noncommutative curved D3-brane in an effective, open bosonic string theory. The geometry considered is obtained by an effective theory of gravity coupled with a nonlinear electromagnetic field and depends only on the generalized (effective) mass and charge which incorporate corrections of first order in the noncommutativity parameter.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Dataset: triplet_eval
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.949

Triplet

Dataset: triplet_eval
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.947

Training Details

Training Dataset

Unnamed Dataset

Size: 10,000 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 36 tokens mean: 170.2 tokens max: 512 tokens	min: 40 tokens mean: 173.14 tokens max: 512 tokens	min: 38 tokens mean: 167.35 tokens max: 512 tokens

Samples:

anchor	positive	negative
We study the notion of the scaled entropy of a filtration of $\sigma$-fields (= decreasing sequence of $\sigma$-fields) introduced by the first author ({V4}). We suggest a method for computing this entropy for the sequence of $\sigma$-fields of pasts of a Markov process determined by a random walk over the trajectories of a Bernoulli action of a commutative or nilpotent countable group (Theorems5,6). Since the scaled entropy is a metric invariant of the filtration, it follows that the sequences of $\sigma$-fields of pasts of random walks over the trajectories of Bernoulli actions of lattices (groups ${\Bbb Z}^d$) are metrically nonisomorphic for different dimensions $d$, and for the same $d$ but different values of the entropy of the Bernoulli scheme. We give a brief survey of the metric theory of filtrations, in particular, formulate the standardness criterion and describe its connections with the scaled entropy and the notion of a tower of measures.	`In this paper we complete a classification of finite linear spaces $\cS$ with line size at most 12 admitting a line-transitive point-imprimitive subgroup of automorphisms. The examples are the Desarguesian projective planes of orders $4,7, 9$ and 11, two designs on 91 points with line size 6, and 467 designs on 729 points with line size 8.`	We show that the combined data from solar, long-baseline and reactor neutrino experiments can exclude the generalized bicycle model of Lorentz noninvariant direction-dependent and/or direction-independent oscillations of massless neutrinos. This model has five parameters, which is more than is needed in standard oscillation phenomenology with neutrino masses. Solar data alone are sufficient to exclude the pure direction-dependent case. The combination of solar and long-baseline data rules out the pure direction-independent case. With the addition of KamLAND data, a mixture of direction-dependent and direction-independent terms in the effective Hamiltonian is also excluded.
We discuss a numerical model for black hole growth and its associated feedback processes that for the first time allows cosmological simulations of structure formation to self-consistently follow the build up of the cosmic population of galaxies and active galactic nuclei. Our model assumes that seed black holes are present at early cosmic epochs at the centres of forming halos. We then track their growth from gas accretion and mergers with other black holes in the course of cosmic time. For black holes that are active, we distinguish between two distinct modes of feedback, depending on the black hole accretion rate itself. Black holes that accrete at high rates are assumed to be in a `quasar regime', where we model their feedback by thermally coupling a small fraction of their bolometric luminosity to the surrounding gas. For black holes with low accretion rates, we conjecture that most of their feedback occurs in mechanical form, where AGN-driven bubbles are injected into a gaseous e...	Context: L'-band (3.8 micron) images of the Galactic Center show a large number of thin filaments in the mini-spiral, located west of the mini-cavity and along the inner edge of the Northern Arm. One possible mechanism that could produce such structures is the interaction of a central wind with the mini-spiral. Additionally, we identify similar features that appear to be associated with stars. Aims: We present the first proper motion measurements of the thin dust filaments observed in the central parsec around SgrA* and investigate possible mechanisms that could be responsible for the observed motions. Methods: The observations have been carried out using the NACO adaptive optics system at the ESO VLT. The images have been transformed to a common coordinate system and features of interest were extracted. Then a cross-correlation technique could be performed in order to determine the offsets between the features with respect to their position in the reference epoch. Results: We derive t...	`We consider a social system of interacting heterogeneous agents with learning abilities, a model close to Random Field Ising Models, where the random field corresponds to the idiosyncratic willingness to pay. Given a fixed price, agents decide repeatedly whether to buy or not a unit of a good, so as to maximize their expected utilities. We show that the equilibrium reached by the system depends on the nature of the information agents use to estimate their expected utilities.`
Low-energy dipole excitations have been investigated theoretically in N=50, several N=82 isotones and the Z=50 Sn isotopes. For this purpose a method incorporating both HFB and multi-phonon QPM theory is applied. A concentration of one-phonon dipole strength located below the neutron emission threshold has been calculated in these nuclei. The analysis of the corresponding neutron and proton dipole transition densities allows to assign a genuine pattern to the low-energy excitations and making them distinct from the conventional GDR modes. Analyzing also the QRPA wave functions of the states we can identify these excitations as Pygmy Dipole Resonance (PDR) modes, recently studied also in Sn and N=82 nuclei. The results for N=50 are exploratory for an experimental project designed for the bremsstrahlung facility at the ELBE accelerator.	The NA60 experiment is a fixed-target experiment at the CERN SPS. It has measured the dimuon yield in Indium--Indium collisions with an In beam of 158 AGeV/c and in p-A collisions with a proton beam of 400 and 158 AGeV/c. The results allow to address three important physics topics, namely the study of the rho spectral function in nuclear collisions, the clarification of the origin of the dimuon excess measured by NA50 in the intermediate mass range, and the J/psi suppression pattern in a collision system different from Pb-Pb. An overview of these results will be given in this paper.	`We examine the stability of a trapped dipolar condensate mixed with a single-component fermion gas at T=0. Whereas pure dipolar condensates with small s-wave interaction are unstable even for small dipole-dipole interaction strength, we find that the admixture of fermions can significantly stabilize them, depending on the strength of the boson-fermion interaction. Within the stable regime we find a region where a ground state is characterized by a density wave along the soft trap direction.`

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
    "triplet_margin": 5
}

Evaluation Dataset

Unnamed Dataset

Size: 1,000 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 39 tokens mean: 175.26 tokens max: 512 tokens	min: 39 tokens mean: 173.33 tokens max: 512 tokens	min: 39 tokens mean: 167.47 tokens max: 512 tokens

Samples:

anchor	positive	negative
`We disprove a conjecture of A. Koldobsky asking whether it is enough to compare $(n-2)$-derivatives of the projection functions of two symmetric convex bodies in the Shephard problem in order to get a positive answer in all dimensions.`	The projective degrees of strict partitions of n were computed for all n < 101 and the partitions with maximal projective degree were found for each n. It was observed that maximizing partitions for successive values of n "lie close to each other" in a certain sense. Conjecturing that this holds for larger values of n, the partitions of maximal degree were computed for all n < 221. The results are consistent with a recent conjecture on the limiting shape of the strict partition of maximal projective degree.	`In [1] was considered the superintegrable system which describes the magnetic dipole with spin 1/2 (neutron) in the field of linear current. Here we present its generalization for any spin which preserves superintegrability. The dynamical symmetry stays the same as it is for spin 1/2.`
We develop a method for measuring and localizing homology classes. This involves two problems. First, we define relevant notions of size for both a homology class and a homology group basis, using ideas from relative homology. Second, we propose an algorithm to compute the optimal homology basis, using techniques from persistent homology and finite field algebra. Classes of the computed optimal basis are localized with cycles conveying their sizes. The algorithm runs in $O(\beta^4 n^3 \log^2 n)$ time, where $n$ is the size of the simplicial complex and $\beta$ is the Betti number of the homology group.	We consider two-way wire-tap channels, where two users are communicating with each other in the presence of an eavesdropper, who has access to the communications through a multiple-access channel. We find achievable rates for two different scenarios, the Gaussian two-way wire-tap channel, (GTW-WT), and the binary additive two-way wire-tap channel, (BATW-WT). It is shown that the two-way channels inherently provide a unique advantage for wire-tapped scenarios, as the users know their own transmitted signals and in effect help encrypt the other user's messages, similar to a one-time pad. We compare the achievable rates to that of the Gaussian multiple-access wire-tap channel (GMAC-WT) to illustrate this advantage.	We report quantitative relations between corruption level and economic factors, such as country wealth and foreign investment per capita, which are characterized by a power law spanning multiple scales of wealth and investments per capita. These relations hold for diverse countries, and also remain stable over different time periods. We also observe a negative correlation between level of corruption and long-term economic growth. We find similar results for two independent indices of corruption, suggesting that the relation between corruption and wealth does not depend on the specific measure of corruption. The functional relations we report have implications when assessing the relative level of corruption for two countries with comparable wealth, and for quantifying the impact of corruption on economic growth and foreign investments.
The paper addresses the space-frequency correlations of electromagnetic waves in general random, bi-anisotropic media whose constitutive tensors are complex Hermitian matrices. The two-frequency Wigner distribution (2f-WD) for polarized waves is introduced to describe the space-frequency correlations and the closed form Wigner-Moyal equation is derived from the Maxwell equations. Two-frequency radiative transfer (2f-RT) equations is then derived from the Wigner-Moyal equation by using the multiple scale expansion. For the simplest isotropic medium, the result coincides with Chandrasekhar's transfer equation. In birefringent media, the 2f-RT equations take the scalar form due to the absence of depolarization. A number of birefringent media such as the chiral, uniaxial and gyrotropic media are examined. For the unpolarized wave in the isotropic medium the 2f-RT equations reduces to the Fokker-Planck equation previously derived in Part I. A similar Fokker-Planck equation is derived from t...	In this paper, it is shown that the cosmological model that was introduced in a sequence of three earlier papers under the title, A Dust Universe Solution to the Dark Energy Problem, can be used to resolve the problem of the great mismatch of numerical values between dark energy from cosmology and zero point energy from quantum theory. It is shown that, if the zero point energies for the cosmic microwave background and for all the rest of the universe that is not cosmic microwave background are introduced into this model as two entities, their separate values appear within this theory in the form of a numerical difference. It is this difference that gives the numerical value for the zero point value of Einstein's dark energy density. Consequently, although the two zero point energies may be large, their difference can give the known small dark energy value from cosmology for dark energy density. Issues relating to interpretation, calculation and measurement associated with this result ...	We demonstrate spin injection into a graphene thin film with high reliability by using non-local magnetoresistance (MR) measurements, in which the electric current path is completely separated from the spin current path. Using these non-local measurements, an obvious MR effect was observed at room temperature; and the MR effect was ascribed to magnetization reversal of ferromagnetic electrodes. This result is a direct demonstration of spin injection into a graphene thin film. Furthermore, this is the first report of spin injection into molecules at room temperature.

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
    "triplet_margin": 5
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
gradient_accumulation_steps: 2
learning_rate: 2e-05
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: True
dataloader_num_workers: 1
dataloader_pin_memory: False

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 8
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 2
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3
max_steps: -1
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 1
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: False
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss	triplet_eval_cosine_accuracy
-1	-1	-	-	0.8210
0.8	500	2.5031	0.7956	0.9410
1.6	1000	1.0464	0.7594	0.9450
2.4	1500	0.5218	0.7086	0.9480
-1	-1	-	-	0.9470

Framework Versions

Python: 3.11.13
Sentence Transformers: 4.1.0
Transformers: 4.52.4
PyTorch: 2.6.0+cu124
Accelerate: 1.8.1
Datasets: 2.14.4
Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

nadrajak
/

allenai-specter-ft2

SentenceTransformer based on sentence-transformers/allenai-specter

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Triplet

Triplet

Training Details

Training Dataset

Unnamed Dataset

Evaluation Dataset

Unnamed Dataset

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

TripletLoss

Model tree for nadrajak/allenai-specter-ft2

Evaluation results