SentenceTransformer based on sentence-transformers/allenai-specter
This is a sentence-transformers model finetuned from sentence-transformers/allenai-specter. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/allenai-specter
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("nadrajak/allenai-specter-ft2")
# Run inference
sentences = [
"We describe azimuth structure commonly associated with elliptic and directed flow in the context of 2D angular autocorrelations for the purpose of precise separation of so-called nonflow (mainly minijets) from flow. We extend the Fourier-transform description of azimuth structure to include power spectra and autocorrelations related by the Wiener-Khintchine theorem. We analyze several examples of conventional flow analysis in that context and question the relevance of reaction plane estimation to flow analysis. We introduce the 2D angular autocorrelation with examples from data analysis and describe a simulation exercise which demonstrates precise separation of flow and nonflow using the 2D autocorrelation method. We show that an alternative correlation measure based on Pearson's normalized covariance provides a more intuitive measure of azimuth structure.",
'It is a brief review on composing and solving Infrared Evolution Equations. They can be used in order to calculate amplitudes of high-energy reactions in different kinematic regions in the double-logarithmic approximation.',
'Moeller\'s energy-momentum complex is employed in order to determine the energy and momentum distributions for a spacetime described by a "generalized Schwarzschild" geometry in (3+1)-dimensions on a noncommutative curved D3-brane in an effective, open bosonic string theory. The geometry considered is obtained by an effective theory of gravity coupled with a nonlinear electromagnetic field and depends only on the generalized (effective) mass and charge which incorporate corrections of first order in the noncommutativity parameter.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Triplet
- Dataset:
triplet_eval
- Evaluated with
TripletEvaluator
Metric | Value |
---|---|
cosine_accuracy | 0.949 |
Triplet
- Dataset:
triplet_eval
- Evaluated with
TripletEvaluator
Metric | Value |
---|---|
cosine_accuracy | 0.947 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 10,000 training samples
- Columns:
anchor
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 36 tokens
- mean: 170.2 tokens
- max: 512 tokens
- min: 40 tokens
- mean: 173.14 tokens
- max: 512 tokens
- min: 38 tokens
- mean: 167.35 tokens
- max: 512 tokens
- Samples:
anchor positive negative We study the notion of the scaled entropy of a filtration of $\sigma$-fields (= decreasing sequence of $\sigma$-fields) introduced by the first author ({V4}). We suggest a method for computing this entropy for the sequence of $\sigma$-fields of pasts of a Markov process determined by a random walk over the trajectories of a Bernoulli action of a commutative or nilpotent countable group (Theorems
5,6). Since the scaled entropy is a metric invariant of the filtration, it follows that the sequences of $\sigma$-fields of pasts of random walks over the trajectories of Bernoulli actions of lattices (groups ${\Bbb Z}^d$) are metrically nonisomorphic for different dimensions $d$, and for the same $d$ but different values of the entropy of the Bernoulli scheme. We give a brief survey of the metric theory of filtrations, in particular, formulate the standardness criterion and describe its connections with the scaled entropy and the notion of a tower of measures.In this paper we complete a classification of finite linear spaces $\cS$ with line size at most 12 admitting a line-transitive point-imprimitive subgroup of automorphisms. The examples are the Desarguesian projective planes of orders $4,7, 9$ and 11, two designs on 91 points with line size 6, and 467 designs on 729 points with line size 8.
We show that the combined data from solar, long-baseline and reactor neutrino experiments can exclude the generalized bicycle model of Lorentz noninvariant direction-dependent and/or direction-independent oscillations of massless neutrinos. This model has five parameters, which is more than is needed in standard oscillation phenomenology with neutrino masses. Solar data alone are sufficient to exclude the pure direction-dependent case. The combination of solar and long-baseline data rules out the pure direction-independent case. With the addition of KamLAND data, a mixture of direction-dependent and direction-independent terms in the effective Hamiltonian is also excluded.
We discuss a numerical model for black hole growth and its associated feedback processes that for the first time allows cosmological simulations of structure formation to self-consistently follow the build up of the cosmic population of galaxies and active galactic nuclei. Our model assumes that seed black holes are present at early cosmic epochs at the centres of forming halos. We then track their growth from gas accretion and mergers with other black holes in the course of cosmic time. For black holes that are active, we distinguish between two distinct modes of feedback, depending on the black hole accretion rate itself. Black holes that accrete at high rates are assumed to be in a `quasar regime', where we model their feedback by thermally coupling a small fraction of their bolometric luminosity to the surrounding gas. For black holes with low accretion rates, we conjecture that most of their feedback occurs in mechanical form, where AGN-driven bubbles are injected into a gaseous e...
Context: L'-band (3.8 micron) images of the Galactic Center show a large number of thin filaments in the mini-spiral, located west of the mini-cavity and along the inner edge of the Northern Arm. One possible mechanism that could produce such structures is the interaction of a central wind with the mini-spiral. Additionally, we identify similar features that appear to be associated with stars. Aims: We present the first proper motion measurements of the thin dust filaments observed in the central parsec around SgrA* and investigate possible mechanisms that could be responsible for the observed motions. Methods: The observations have been carried out using the NACO adaptive optics system at the ESO VLT. The images have been transformed to a common coordinate system and features of interest were extracted. Then a cross-correlation technique could be performed in order to determine the offsets between the features with respect to their position in the reference epoch. Results: We derive t...
We consider a social system of interacting heterogeneous agents with learning abilities, a model close to Random Field Ising Models, where the random field corresponds to the idiosyncratic willingness to pay. Given a fixed price, agents decide repeatedly whether to buy or not a unit of a good, so as to maximize their expected utilities. We show that the equilibrium reached by the system depends on the nature of the information agents use to estimate their expected utilities.
Low-energy dipole excitations have been investigated theoretically in N=50, several N=82 isotones and the Z=50 Sn isotopes. For this purpose a method incorporating both HFB and multi-phonon QPM theory is applied. A concentration of one-phonon dipole strength located below the neutron emission threshold has been calculated in these nuclei. The analysis of the corresponding neutron and proton dipole transition densities allows to assign a genuine pattern to the low-energy excitations and making them distinct from the conventional GDR modes. Analyzing also the QRPA wave functions of the states we can identify these excitations as Pygmy Dipole Resonance (PDR) modes, recently studied also in Sn and N=82 nuclei. The results for N=50 are exploratory for an experimental project designed for the bremsstrahlung facility at the ELBE accelerator.
The NA60 experiment is a fixed-target experiment at the CERN SPS. It has measured the dimuon yield in Indium--Indium collisions with an In beam of 158 AGeV/c and in p-A collisions with a proton beam of 400 and 158 AGeV/c. The results allow to address three important physics topics, namely the study of the rho spectral function in nuclear collisions, the clarification of the origin of the dimuon excess measured by NA50 in the intermediate mass range, and the J/psi suppression pattern in a collision system different from Pb-Pb. An overview of these results will be given in this paper.
We examine the stability of a trapped dipolar condensate mixed with a single-component fermion gas at T=0. Whereas pure dipolar condensates with small s-wave interaction are unstable even for small dipole-dipole interaction strength, we find that the admixture of fermions can significantly stabilize them, depending on the strength of the boson-fermion interaction. Within the stable regime we find a region where a ground state is characterized by a density wave along the soft trap direction.
- Loss:
TripletLoss
with these parameters:{ "distance_metric": "TripletDistanceMetric.EUCLIDEAN", "triplet_margin": 5 }
Evaluation Dataset
Unnamed Dataset
- Size: 1,000 evaluation samples
- Columns:
anchor
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 39 tokens
- mean: 175.26 tokens
- max: 512 tokens
- min: 39 tokens
- mean: 173.33 tokens
- max: 512 tokens
- min: 39 tokens
- mean: 167.47 tokens
- max: 512 tokens
- Samples:
anchor positive negative We disprove a conjecture of A. Koldobsky asking whether it is enough to compare $(n-2)$-derivatives of the projection functions of two symmetric convex bodies in the Shephard problem in order to get a positive answer in all dimensions.
The projective degrees of strict partitions of n were computed for all n < 101 and the partitions with maximal projective degree were found for each n. It was observed that maximizing partitions for successive values of n "lie close to each other" in a certain sense. Conjecturing that this holds for larger values of n, the partitions of maximal degree were computed for all n < 221. The results are consistent with a recent conjecture on the limiting shape of the strict partition of maximal projective degree.
In [1] was considered the superintegrable system which describes the magnetic dipole with spin 1/2 (neutron) in the field of linear current. Here we present its generalization for any spin which preserves superintegrability. The dynamical symmetry stays the same as it is for spin 1/2.
We develop a method for measuring and localizing homology classes. This involves two problems. First, we define relevant notions of size for both a homology class and a homology group basis, using ideas from relative homology. Second, we propose an algorithm to compute the optimal homology basis, using techniques from persistent homology and finite field algebra. Classes of the computed optimal basis are localized with cycles conveying their sizes. The algorithm runs in $O(\beta^4 n^3 \log^2 n)$ time, where $n$ is the size of the simplicial complex and $\beta$ is the Betti number of the homology group.
We consider two-way wire-tap channels, where two users are communicating with each other in the presence of an eavesdropper, who has access to the communications through a multiple-access channel. We find achievable rates for two different scenarios, the Gaussian two-way wire-tap channel, (GTW-WT), and the binary additive two-way wire-tap channel, (BATW-WT). It is shown that the two-way channels inherently provide a unique advantage for wire-tapped scenarios, as the users know their own transmitted signals and in effect help encrypt the other user's messages, similar to a one-time pad. We compare the achievable rates to that of the Gaussian multiple-access wire-tap channel (GMAC-WT) to illustrate this advantage.
We report quantitative relations between corruption level and economic factors, such as country wealth and foreign investment per capita, which are characterized by a power law spanning multiple scales of wealth and investments per capita. These relations hold for diverse countries, and also remain stable over different time periods. We also observe a negative correlation between level of corruption and long-term economic growth. We find similar results for two independent indices of corruption, suggesting that the relation between corruption and wealth does not depend on the specific measure of corruption. The functional relations we report have implications when assessing the relative level of corruption for two countries with comparable wealth, and for quantifying the impact of corruption on economic growth and foreign investments.
The paper addresses the space-frequency correlations of electromagnetic waves in general random, bi-anisotropic media whose constitutive tensors are complex Hermitian matrices. The two-frequency Wigner distribution (2f-WD) for polarized waves is introduced to describe the space-frequency correlations and the closed form Wigner-Moyal equation is derived from the Maxwell equations. Two-frequency radiative transfer (2f-RT) equations is then derived from the Wigner-Moyal equation by using the multiple scale expansion. For the simplest isotropic medium, the result coincides with Chandrasekhar's transfer equation. In birefringent media, the 2f-RT equations take the scalar form due to the absence of depolarization. A number of birefringent media such as the chiral, uniaxial and gyrotropic media are examined. For the unpolarized wave in the isotropic medium the 2f-RT equations reduces to the Fokker-Planck equation previously derived in Part I. A similar Fokker-Planck equation is derived from t...
In this paper, it is shown that the cosmological model that was introduced in a sequence of three earlier papers under the title, A Dust Universe Solution to the Dark Energy Problem, can be used to resolve the problem of the great mismatch of numerical values between dark energy from cosmology and zero point energy from quantum theory. It is shown that, if the zero point energies for the cosmic microwave background and for all the rest of the universe that is not cosmic microwave background are introduced into this model as two entities, their separate values appear within this theory in the form of a numerical difference. It is this difference that gives the numerical value for the zero point value of Einstein's dark energy density. Consequently, although the two zero point energies may be large, their difference can give the known small dark energy value from cosmology for dark energy density. Issues relating to interpretation, calculation and measurement associated with this result ...
We demonstrate spin injection into a graphene thin film with high reliability by using non-local magnetoresistance (MR) measurements, in which the electric current path is completely separated from the spin current path. Using these non-local measurements, an obvious MR effect was observed at room temperature; and the MR effect was ascribed to magnetization reversal of ferromagnetic electrodes. This result is a direct demonstration of spin injection into a graphene thin film. Furthermore, this is the first report of spin injection into molecules at room temperature.
- Loss:
TripletLoss
with these parameters:{ "distance_metric": "TripletDistanceMetric.EUCLIDEAN", "triplet_margin": 5 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsgradient_accumulation_steps
: 2learning_rate
: 2e-05lr_scheduler_type
: cosinewarmup_ratio
: 0.1fp16
: Truedataloader_num_workers
: 1dataloader_pin_memory
: False
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 8per_device_eval_batch_size
: 8per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 2eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: cosinelr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 1dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Falsedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss | triplet_eval_cosine_accuracy |
---|---|---|---|---|
-1 | -1 | - | - | 0.8210 |
0.8 | 500 | 2.5031 | 0.7956 | 0.9410 |
1.6 | 1000 | 1.0464 | 0.7594 | 0.9450 |
2.4 | 1500 | 0.5218 | 0.7086 | 0.9480 |
-1 | -1 | - | - | 0.9470 |
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 4.1.0
- Transformers: 4.52.4
- PyTorch: 2.6.0+cu124
- Accelerate: 1.8.1
- Datasets: 2.14.4
- Tokenizers: 0.21.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
TripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Downloads last month
- 47
Model tree for nadrajak/allenai-specter-ft2
Base model
sentence-transformers/allenai-specterEvaluation results
- Cosine Accuracy on triplet evalself-reported0.949
- Cosine Accuracy on triplet evalself-reported0.947