ModernBERT Embed base Legal Matryoshka

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/modernbert-embed-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ordersharelook/modernbert-embed-base-legal-matryoshka-2")
# Run inference
sentences = [
    'itself be a ground for reversal—i.e., for his winning a reversal on appeal. In light of this \nanomaly, we cannot find that a lack of stated reasons constitutes an abuse of discretion here. \n¶ 36 \n \nWhere no transcript or bystander’s report of the proceedings was provided to us, and where \nno reasons were stated in the order itself, we presume that the trial court acted appropriately',
    'What is presumed about the trial court due to the absence of a transcript or bystander’s report?',
    'What type of costs does the payment structure include predominantly?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.524
cosine_accuracy@3 0.5688
cosine_accuracy@5 0.6631
cosine_accuracy@10 0.7326
cosine_precision@1 0.524
cosine_precision@3 0.4997
cosine_precision@5 0.3824
cosine_precision@10 0.2243
cosine_recall@1 0.1841
cosine_recall@3 0.4921
cosine_recall@5 0.6149
cosine_recall@10 0.7152
cosine_ndcg@10 0.6248
cosine_mrr@10 0.57
cosine_map@100 0.6114

Information Retrieval

Metric Value
cosine_accuracy@1 0.5162
cosine_accuracy@3 0.5518
cosine_accuracy@5 0.6383
cosine_accuracy@10 0.7094
cosine_precision@1 0.5162
cosine_precision@3 0.4874
cosine_precision@5 0.3679
cosine_precision@10 0.2189
cosine_recall@1 0.1817
cosine_recall@3 0.4812
cosine_recall@5 0.5922
cosine_recall@10 0.6984
cosine_ndcg@10 0.6107
cosine_mrr@10 0.5574
cosine_map@100 0.5987

Information Retrieval

Metric Value
cosine_accuracy@1 0.4822
cosine_accuracy@3 0.5317
cosine_accuracy@5 0.6167
cosine_accuracy@10 0.694
cosine_precision@1 0.4822
cosine_precision@3 0.4616
cosine_precision@5 0.3564
cosine_precision@10 0.213
cosine_recall@1 0.1698
cosine_recall@3 0.4545
cosine_recall@5 0.5719
cosine_recall@10 0.6793
cosine_ndcg@10 0.5855
cosine_mrr@10 0.5285
cosine_map@100 0.5712

Information Retrieval

Metric Value
cosine_accuracy@1 0.442
cosine_accuracy@3 0.473
cosine_accuracy@5 0.5611
cosine_accuracy@10 0.643
cosine_precision@1 0.442
cosine_precision@3 0.4189
cosine_precision@5 0.3224
cosine_precision@10 0.1983
cosine_recall@1 0.1548
cosine_recall@3 0.4123
cosine_recall@5 0.5179
cosine_recall@10 0.6315
cosine_ndcg@10 0.539
cosine_mrr@10 0.4835
cosine_map@100 0.5243

Information Retrieval

Metric Value
cosine_accuracy@1 0.3539
cosine_accuracy@3 0.3849
cosine_accuracy@5 0.4467
cosine_accuracy@10 0.524
cosine_precision@1 0.3539
cosine_precision@3 0.34
cosine_precision@5 0.2612
cosine_precision@10 0.1617
cosine_recall@1 0.1227
cosine_recall@3 0.3323
cosine_recall@5 0.4128
cosine_recall@10 0.5093
cosine_ndcg@10 0.4342
cosine_mrr@10 0.3893
cosine_map@100 0.4293

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 5,822 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 28 tokens
    • mean: 97.37 tokens
    • max: 158 tokens
    • min: 7 tokens
    • mean: 16.47 tokens
    • max: 31 tokens
  • Samples:
    positive anchor
    privilege because the plaintiff says that these five opinions have been officially disclosed in the
    public domain.70 See Pl.’s First 445 Opp’n at 32–33. Similar to the plaintiff’s argument above
    as to the CIA’s Exemption 1 withholdings, see supra Part III.F.1, the plaintiff contends that
    “[t]his evidence casts significant doubt on the good faith of OLC, and the Court should order
    What is similar to the plaintiff’s argument about the CIA's Exemption 1 withholdings?
    The first issue we must resolve is whether, as plaintiff argues, we lack jurisdiction to hear
    this appeal. People v. Brindley, 2017 IL App (5th) 160189, ¶ 14 (“[t]he first issue we must
    address is the jurisdiction of this court to hear” the appeal).

    ¶ 21




    A. Court of Limited Jurisdiction
    ¶ 22
    What paragraph immediately follows the mention of jurisdiction?
    met its burden to show that Senetas is a competitor to DR. The potential for a
    “joint collaboration” between Senetas and DR does not necessarily mean they are
    competitors. Senetas operates in a different market than DR and there is no

    69 Senetas’ May 11, 2017 press release announcing its investment in DR includes a quote
    On what date did Senetas release a press statement about its investment in DR?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.8791 10 5.6176 - - - - -
1.0 12 - 0.5899 0.5777 0.5526 0.4889 0.3844
1.7033 20 2.4277 - - - - -
2.0 24 - 0.6201 0.6050 0.5781 0.5215 0.4136
2.5275 30 1.8308 - - - - -
3.0 36 - 0.6248 0.6075 0.5845 0.5373 0.4347
3.3516 40 1.5394 - - - - -
4.0 48 - 0.6248 0.6107 0.5855 0.539 0.4342
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.53.2
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.9.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
7
Safetensors
Model size
149M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ordersharelook/modernbert-embed-base-legal-matryoshka-2

Finetuned
(63)
this model

Evaluation results