SentenceTransformer
This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Maximum Sequence Length: 128 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'Determining Unknown Angles in Complex Composite Figures. Triangles. Geometry. Grade 4. Elementary Math. Math. K-12. ',
'Determining Unknown Angles in Complex Composite Figures. . ',
"Initial value & common ratio of exponential functions. Get comfortable with the basic ingredients of exponential functions: the\nInitial value and the common ratio.\n\n. - [Voiceover] So let's think about a function. I'll just give an example. Let's say, h of n is equal to one-fourth times two to the n. So, first of all, you might notice something interesting here. We have the variable, the input into our function. It's in the exponent. And a function like this is called an exponential function. So this is an exponential. Ex-po-nen-tial. Exponential function, and that's because the variable, the input into our function, is sitting in its definition of what is the output of that function going to be. The input is in the exponent. I could write another exponential function. I could write, f of, let's say the input is a variable, t, is equal to is equal to five times times three to the t. Once again, this is an exponential function. Now there's a couple of interesting things to think about in exponential function. In fact, we'll explore many of them, but I'll get a little used to the terminology, so one thing that you might see is a notion of an initial value. In-i-tial Intitial value. And this is essentially the value of the function when the input is zero. So, for in these cases, the initial value for the function, h, is going to be, h of zero. And when we evaluate that, that's going to be one-fourth times two to the zero. Well, two to the zero power, is just one. So it's equal to one-fourth. So the initial value, at least in this case, it seems to just be that number that sits out here. We have the initial value times some number to this exponent. And we'll come up with the name for this number. Well let's see if this was true over here for, f of t. So, if we look at its intial value, f of zero is going to be five times three to the zero power and, the same thing again. Three to the zero is just one. Five times one is just five. So the initial value is once again, that. So if you have exponential functions of this form, it makes sense. Your initial value, well if you put a zero in for the exponent, then the number raised to the exponent is just going to be one, and you're just going to be left with that thing that you're multiplying by that. Hopefully that makes sense, but since you're looking at it, hopefully it does make a little bit. Now, you might be saying, well what do we call this number? What do we call that number there? Or that number there? And that's called the common ratio. The common common ratio. And in my brain, we say well why is it called a common ratio? Well, if you thought about integer inputs into this, especially sequential integer inputs into it, you would see a pattern. For example, h of, let me do this in that green color, h of zero is equal to, we already established one-fourth. Now, what is h of one going to be equal to? It's going to be one-fourth times two to the first power. So it's going to be one-fourth times two. What is h of two going to be equal to? Well, it's going to be one-fourth times two squared, so it's going to be times two times two. Or, we could just view this as this is going to be two times h of one. And actually I should have done this when I wrote this one out, but this we can write as two times h of zero. So notice, if we were to take the ratio between h of two and h of one, it would be two. If we were to take the ratio between h of one and h of zero, it would be two. That is the common ratio between successive whole number inputs into our function. So, h of I could say h of n plus one over h of n is going to be equal to is going to be equal to actually I can work it out mathematically. One-fourth times two to the n plus one over one-fourth times two to the n. That cancels. Two to the n plus one, divided by two to the n is just going to be equal to two. That is your common ratio. So for the function h. For the function f, our common ratio is three. If we were to go the other way around, if someone said, hey, I have some function whose initial value, so let's say, I have some function, I'll do this in a new color, I have some function, g, and we know that its initial initial value is five. And someone were to say its common ratio its common ratio is six, what would this exponential function look like? And they're telling you this is an exponential function. Well, g of let's say x is the input, is going to be equal to our initial value, which is five. That's not a negative sign there, Our initial value is five. I'll write equals to make that clear. And then times our common ratio to the x power. So once again, initial value, right over there, that's the five. And then our common ratio is the six, right over there. So hopefully that gets you a little bit familiar with some of the parts of an exponential function, why they are called what they are called.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
eval-ir
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.6326 |
cosine_accuracy@3 | 0.7914 |
cosine_accuracy@5 | 0.8481 |
cosine_accuracy@10 | 0.8968 |
cosine_precision@10 | 0.2383 |
cosine_precision@50 | 0.0709 |
cosine_precision@100 | 0.0392 |
cosine_recall@10 | 0.7041 |
cosine_recall@50 | 0.8725 |
cosine_recall@100 | 0.917 |
cosine_ndcg@10 | 0.6529 |
cosine_mrr@10 | 0.7234 |
cosine_map@100 | 0.5971 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 190,175 training samples
- Columns:
topic
andcontent
- Approximate statistics based on the first 1000 samples:
topic content type string string details - min: 15 tokens
- mean: 41.93 tokens
- max: 128 tokens
- min: 5 tokens
- mean: 62.57 tokens
- max: 128 tokens
- Samples:
topic content Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and sides of a convex polygon, the size and exterior angle of any convex polygon.
Regular and Irregular Polygons. .
Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and sides of a convex polygon, the size and exterior angle of any convex polygon.
Classifying triangles based on its angles. A triangle is a closed figure consisting of three-line segments which are joined end to end. The joined line segments of a triangle form three angles. You can classify triangles according to sides and angles.. Classifying triangles based on its angles
Albert Mhango, Mzimba
Introduction:
A triangle is a closed figure consisting of three-line segments which are joined end to
end. The joined line segments of a triangle form three angles. You can classify
triangles according to sides and angles.
What is an interior angle? An interior angle is an inside of a shape.
Explanation:
When classifying triangles according to its angles, you look at the sizes of their
interior angles. Under this classification, you have the following types of triangles:
1. Acute angled triangle: A triangle in which all interior angles are acute angles. Do
you remember the meaning of acute angle? It is an angle which is less than 90°.
Figure shows an example of an acute an...Triangles and polygons. Space, shape and measurement. Form 1. Malawi Mathematics Syllabus. Learning outcomes: students must be able to solve problems involving angles, triangles and polygons including: types of triangles, calculate the interior and exterior angles of a triangle, different types of polygons, interior angles and sides of a convex polygon, the size and exterior angle of any convex polygon.
Classifying triangles. Learn to categorize triangles as scalene, isosceles, equilateral, acute,
right, or obtuse.
. What I want to do in this video is talk about the two main ways that triangles are categorized. The first way is based on whether or not the triangle has equal sides, or at least a few equal sides. Then the other way is based on the measure of the angles of the triangle. So the first categorization right here, and all of these are based on whether or not the triangle has equal sides, is scalene. And a scalene triangle is a triangle where none of the sides are equal. So for example, if I have a triangle like this, where this side has length 3, this side has length 4, and this side has length 5, then this is going to be a scalene triangle. None of the sides have an equal length. Now an isosceles triangle is a triangle where at least two of the sides have equal lengths. So for example, this would be an isosceles triangle. Maybe this has length 3, this has length 3, and this... - Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 128per_device_eval_batch_size
: 128learning_rate
: 2e-05num_train_epochs
: 1warmup_ratio
: 0.05fp16
: Trueload_best_model_at_end
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 128per_device_eval_batch_size
: 128per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.05warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | eval-ir_cosine_ndcg@10 |
---|---|---|---|
0.0007 | 1 | 0.1782 | - |
0.1999 | 297 | 0.1245 | 0.6279 |
0.3997 | 594 | 0.1224 | 0.6423 |
0.5996 | 891 | 0.1168 | 0.6493 |
0.7995 | 1188 | 0.1179 | 0.6541 |
0.9993 | 1485 | 0.1227 | 0.6529 |
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 4.1.0
- Transformers: 4.52.4
- PyTorch: 2.6.0+cu124
- Accelerate: 1.7.0
- Datasets: 2.14.4
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 81
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Evaluation results
- Cosine Accuracy@1 on eval irself-reported0.633
- Cosine Accuracy@3 on eval irself-reported0.791
- Cosine Accuracy@5 on eval irself-reported0.848
- Cosine Accuracy@10 on eval irself-reported0.897
- Cosine Precision@10 on eval irself-reported0.238
- Cosine Precision@50 on eval irself-reported0.071
- Cosine Precision@100 on eval irself-reported0.039
- Cosine Recall@10 on eval irself-reported0.704
- Cosine Recall@50 on eval irself-reported0.873
- Cosine Recall@100 on eval irself-reported0.917