SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Maximum Sequence Length: 128 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Determining Unknown Angles in Complex Composite Figures. Triangles. Geometry. Grade 4. Elementary Math. Math. K-12. ',
    'Determining Unknown Angles in Complex Composite Figures. . ',
    "Initial value & common ratio of exponential functions. Get comfortable with the basic ingredients of exponential functions: the\nInitial value and the common ratio.\n\n. - [Voiceover] So let's think about a function. I'll just give an example. Let's say, h of n is equal to one-fourth times two to the n. So, first of all, you might notice something interesting here. We have the variable, the input into our function. It's in the exponent. And a function like this is called an exponential function. So this is an exponential. Ex-po-nen-tial. Exponential function, and that's because the variable, the input into our function, is sitting in its definition of what is the output of that function going to be. The input is in the exponent. I could write another exponential function. I could write, f of, let's say the input is a variable, t, is equal to is equal to five times times three to the t. Once again, this is an exponential function. Now there's a couple of interesting things to think about in exponential function. In fact, we'll explore many of them, but I'll get a little used to the terminology, so one thing that you might see is a notion of an initial value. In-i-tial Intitial value. And this is essentially the value of the function when the input is zero. So, for in these cases, the initial value for the function, h, is going to be, h of zero. And when we evaluate that, that's going to be one-fourth times two to the zero. Well, two to the zero power, is just one. So it's equal to one-fourth. So the initial value, at least in this case, it seems to just be that number that sits out here. We have the initial value times some number to this exponent. And we'll come up with the name for this number. Well let's see if this was true over here for, f of t. So, if we look at its intial value, f of zero is going to be five times three to the zero power and, the same thing again. Three to the zero is just one. Five times one is just five. So the initial value is once again, that. So if you have exponential functions of this form, it makes sense. Your initial value, well if you put a zero in for the exponent, then the number raised to the exponent is just going to be one, and you're just going to be left with that thing that you're multiplying by that. Hopefully that makes sense, but since you're looking at it, hopefully it does make a little bit. Now, you might be saying, well what do we call this number? What do we call that number there? Or that number there? And that's called the common ratio. The common common ratio. And in my brain, we say well why is it called a common ratio? Well, if you thought about integer inputs into this, especially sequential integer inputs into it, you would see a pattern. For example, h of, let me do this in that green color, h of zero is equal to, we already established one-fourth. Now, what is h of one going to be equal to? It's going to be one-fourth times two to the first power. So it's going to be one-fourth times two. What is h of two going to be equal to? Well, it's going to be one-fourth times two squared, so it's going to be times two times two. Or, we could just view this as this is going to be two times h of one. And actually I should have done this when I wrote this one out, but this we can write as two times h of zero. So notice, if we were to take the ratio between h of two and h of one, it would be two. If we were to take the ratio between h of one and h of zero, it would be two. That is the common ratio between successive whole number inputs into our function. So, h of I could say h of n plus one over h of n is going to be equal to is going to be equal to actually I can work it out mathematically. One-fourth times two to the n plus one over one-fourth times two to the n. That cancels. Two to the n plus one, divided by two to the n is just going to be equal to two. That is your common ratio. So for the function h. For the function f, our common ratio is three. If we were to go the other way around, if someone said, hey, I have some function whose initial value, so let's say, I have some function, I'll do this in a new color, I have some function, g, and we know that its initial initial value is five. And someone were to say its common ratio its common ratio is six, what would this exponential function look like? And they're telling you this is an exponential function. Well, g of let's say x is the input, is going to be equal to our initial value, which is five. That's not a negative sign there, Our initial value is five. I'll write equals to make that clear. And then times our common ratio to the x power. So once again, initial value, right over there, that's the five. And then our common ratio is the six, right over there. So hopefully that gets you a little bit familiar with some of the parts of an exponential function, why they are called what they are called.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Dataset: eval-ir
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.6326
cosine_accuracy@3	0.7914
cosine_accuracy@5	0.8481
cosine_accuracy@10	0.8968
cosine_precision@10	0.2383
cosine_precision@50	0.0709
cosine_precision@100	0.0392
cosine_recall@10	0.7041
cosine_recall@50	0.8725
cosine_recall@100	0.917
cosine_ndcg@10	0.6529
cosine_mrr@10	0.7234
cosine_map@100	0.5971

Training Details

Training Dataset

Unnamed Dataset

Size: 190,175 training samples
Columns: topic and content
Approximate statistics based on the first 1000 samples:
topic content
type string string
details
min: 15 tokens
mean: 41.93 tokens
max: 128 tokens

min: 5 tokens
mean: 62.57 tokens
max: 128 tokens

	topic	content
type	string	string
details	min: 15 tokens mean: 41.93 tokens max: 128 tokens	min: 5 tokens mean: 62.57 tokens max: 128 tokens

Samples:

topic	content
`Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and sides of a convex polygon, the size and exterior angle of any convex polygon.`	`Regular and Irregular Polygons. .`
`Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and sides of a convex polygon, the size and exterior angle of any convex polygon.`	Classifying triangles based on its angles. A triangle is a closed figure consisting of three-line segments which are joined end to end. The joined line segments of a triangle form three angles. You can classify triangles according to sides and angles.. Classifying triangles based on its angles Albert Mhango, Mzimba Introduction: A triangle is a closed figure consisting of three-line segments which are joined end to end. The joined line segments of a triangle form three angles. You can classify triangles according to sides and angles. What is an interior angle? An interior angle is an inside of a shape. Explanation: When classifying triangles according to its angles, you look at the sizes of their interior angles. Under this classification, you have the following types of triangles: 1. Acute angled triangle: A triangle in which all interior angles are acute angles. Do you remember the meaning of acute angle? It is an angle which is less than 90°. Figure shows an example of an acute an...
`Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and sides of a convex polygon, the size and exterior angle of any convex polygon.`	Classifying triangles. Learn to categorize triangles as scalene, isosceles, equilateral, acute, right, or obtuse. . What I want to do in this video is talk about the two main ways that triangles are categorized. The first way is based on whether or not the triangle has equal sides, or at least a few equal sides. Then the other way is based on the measure of the angles of the triangle. So the first categorization right here, and all of these are based on whether or not the triangle has equal sides, is scalene. And a scalene triangle is a triangle where none of the sides are equal. So for example, if I have a triangle like this, where this side has length 3, this side has length 4, and this side has length 5, then this is going to be a scalene triangle. None of the sides have an equal length. Now an isosceles triangle is a triangle where at least two of the sides have equal lengths. So for example, this would be an isosceles triangle. Maybe this has length 3, this has length 3, and this...

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.05
fp16: True
load_best_model_at_end: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.05
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	eval-ir_cosine_ndcg@10
0.0007	1	0.1782	-
0.1999	297	0.1245	0.6279
0.3997	594	0.1224	0.6423
0.5996	891	0.1168	0.6493
0.7995	1188	0.1179	0.6541
0.9993	1485	0.1227	0.6529

Framework Versions

Python: 3.11.13
Sentence Transformers: 4.1.0
Transformers: 4.52.4
PyTorch: 2.6.0+cu124
Accelerate: 1.7.0
Datasets: 2.14.4
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

GphaHoa
/

academic_all_mpnet_base_v2