Matryoshka Representation Learning
Paper
•
2205.13147
•
Published
•
25
This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'What is described in Section 25 of the Arbitration Act?',
'. (3) The provision of subsections (1) and (2) shall apply only to the extent agreed to by the parties. (4) The arbitral tribunal shall decide according to considerations of general justice and fairness or trade usages only if the parties have expressly authorised it to do so. Section 25 of the Arbitration Act describes the form and content of the arbitral award as follows: 25',
'. 9 and 10 based on the objection taken to them by the Counsel for HNB, despite the fact that they did not arise from the pleadings, and were altogether inconsistent with them, answered the afore-stated question of law (in respect of which this Court had granted Leave to Appeal in that case) in the affirmative and in favour of HNB, and stated as follows: “In conclusion, it needs to be emphasised',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
dim_768, dim_512, dim_256, dim_128 and dim_64InformationRetrievalEvaluator| Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
|---|---|---|---|---|---|
| cosine_accuracy@1 | 0.5741 | 0.5741 | 0.5552 | 0.4971 | 0.3968 |
| cosine_accuracy@3 | 0.7616 | 0.7631 | 0.7282 | 0.6759 | 0.5581 |
| cosine_accuracy@5 | 0.8198 | 0.8212 | 0.7922 | 0.7355 | 0.6221 |
| cosine_accuracy@10 | 0.8852 | 0.875 | 0.8619 | 0.8241 | 0.7253 |
| cosine_precision@1 | 0.5741 | 0.5741 | 0.5552 | 0.4971 | 0.3968 |
| cosine_precision@3 | 0.2539 | 0.2544 | 0.2427 | 0.2253 | 0.186 |
| cosine_precision@5 | 0.164 | 0.1642 | 0.1584 | 0.1471 | 0.1244 |
| cosine_precision@10 | 0.0885 | 0.0875 | 0.0862 | 0.0824 | 0.0725 |
| cosine_recall@1 | 0.5741 | 0.5741 | 0.5552 | 0.4971 | 0.3968 |
| cosine_recall@3 | 0.7616 | 0.7631 | 0.7282 | 0.6759 | 0.5581 |
| cosine_recall@5 | 0.8198 | 0.8212 | 0.7922 | 0.7355 | 0.6221 |
| cosine_recall@10 | 0.8852 | 0.875 | 0.8619 | 0.8241 | 0.7253 |
| cosine_ndcg@10 | 0.7308 | 0.7262 | 0.7078 | 0.6568 | 0.5514 |
| cosine_mrr@10 | 0.6812 | 0.6782 | 0.6586 | 0.6038 | 0.497 |
| cosine_map@100 | 0.6852 | 0.6828 | 0.6631 | 0.609 | 0.505 |
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
How must the District Court exercise its discretion? |
imposition of ‘ a’ term; (5) It is not mandatory to impose security, as evinced by the use of the conjunction “or”; (6) In imposing terms, the District Court must be mindful of the objectives of the Act, and its discretion must be exercised judicially |
What is the source of the observation made by Christian Appu? |
. Christian Appu , (1895) 1 NLR 288 observed that , “possession is "disturbed" either by an action intended to remove the possessor from the land, or by acts which prevent the possessor from enjoying the free and full use of 12 the land of which he is in the course of acquiring the dominion, and which convert his continuous user into a disconnected and divided user ” |
What must the defendant do regarding the plaintiff's claim? |
. The Court of Appeal in Ramanayake v Sampath Bank Ltd and Others [(1993) 1 Sri LR 145 at page 153] has held that, “The defendant has to deal with the plaintiff’s claim on its merits; it is not competent for the defendant to merely set out technical objections. It is also incumbent on the defendant to reveal his defence, if he has any |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: epochper_device_train_batch_size: 16gradient_accumulation_steps: 8learning_rate: 2e-05lr_scheduler_type: cosinewarmup_ratio: 0.1tf32: Trueload_best_model_at_end: Trueoptim: adamw_torch_fusedbatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 8eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Truelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
|---|---|---|---|---|---|---|---|
| 0.1034 | 5 | 29.8712 | - | - | - | - | - |
| 0.2067 | 10 | 26.1323 | - | - | - | - | - |
| 0.3101 | 15 | 17.8585 | - | - | - | - | - |
| 0.4134 | 20 | 14.0232 | - | - | - | - | - |
| 0.5168 | 25 | 11.6897 | - | - | - | - | - |
| 0.6202 | 30 | 10.8431 | - | - | - | - | - |
| 0.7235 | 35 | 9.264 | - | - | - | - | - |
| 0.8269 | 40 | 11.2186 | - | - | - | - | - |
| 0.9302 | 45 | 9.9143 | - | - | - | - | - |
| 1.0 | 49 | - | 0.7134 | 0.7110 | 0.6902 | 0.6341 | 0.5282 |
| 1.0207 | 50 | 7.2581 | - | - | - | - | - |
| 1.1240 | 55 | 6.066 | - | - | - | - | - |
| 1.2274 | 60 | 6.3626 | - | - | - | - | - |
| 1.3307 | 65 | 6.8135 | - | - | - | - | - |
| 1.4341 | 70 | 5.5556 | - | - | - | - | - |
| 1.5375 | 75 | 6.0144 | - | - | - | - | - |
| 1.6408 | 80 | 6.1965 | - | - | - | - | - |
| 1.7442 | 85 | 5.596 | - | - | - | - | - |
| 1.8475 | 90 | 6.631 | - | - | - | - | - |
| 1.9509 | 95 | 6.3319 | - | - | - | - | - |
| 2.0 | 98 | - | 0.7331 | 0.7304 | 0.7074 | 0.6569 | 0.5477 |
| 2.0413 | 100 | 4.7382 | - | - | - | - | - |
| 2.1447 | 105 | 4.1516 | - | - | - | - | - |
| 2.2481 | 110 | 4.3517 | - | - | - | - | - |
| 2.3514 | 115 | 3.7044 | - | - | - | - | - |
| 2.4548 | 120 | 4.1593 | - | - | - | - | - |
| 2.5581 | 125 | 4.8081 | - | - | - | - | - |
| 2.6615 | 130 | 3.908 | - | - | - | - | - |
| 2.7649 | 135 | 3.7684 | - | - | - | - | - |
| 2.8682 | 140 | 3.8927 | - | - | - | - | - |
| 2.9509 | 144 | - | 0.7308 | 0.7262 | 0.7078 | 0.6568 | 0.5514 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
answerdotai/ModernBERT-base