ModernBERT Embed base Legal Matryoshka

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/modernbert-embed-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("iamkpi/modernbert-embed-base-legal-matryoshka-2")
# Run inference
sentences = [
    'the dispensary, where he went after he was shot.  \nAs a witness for the State, Detective Victor Liu of the Baltimore Police Department \ntestified that, on September 3, 2021, he responded to a report of “a shooting incident in the \n3900 block of Falls Road.”  There, Detective Liu saw an SUV with bullet holes in the back',
    'What did Detective Liu see at the scene of the shooting incident?',
    'Is the Commission considered an agency under § 551(1)?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.5209
cosine_accuracy@3 0.5873
cosine_accuracy@5 0.7002
cosine_accuracy@10 0.7651
cosine_precision@1 0.5209
cosine_precision@3 0.4956
cosine_precision@5 0.3913
cosine_precision@10 0.2331
cosine_recall@1 0.1928
cosine_recall@3 0.5012
cosine_recall@5 0.6439
cosine_recall@10 0.7571
cosine_ndcg@10 0.6492
cosine_mrr@10 0.5807
cosine_map@100 0.6256

Information Retrieval

Metric Value
cosine_accuracy@1 0.5317
cosine_accuracy@3 0.5765
cosine_accuracy@5 0.6739
cosine_accuracy@10 0.7558
cosine_precision@1 0.5317
cosine_precision@3 0.4992
cosine_precision@5 0.383
cosine_precision@10 0.2295
cosine_recall@1 0.1937
cosine_recall@3 0.5009
cosine_recall@5 0.6256
cosine_recall@10 0.7465
cosine_ndcg@10 0.6441
cosine_mrr@10 0.5823
cosine_map@100 0.6232

Information Retrieval

Metric Value
cosine_accuracy@1 0.4853
cosine_accuracy@3 0.5363
cosine_accuracy@5 0.6306
cosine_accuracy@10 0.7156
cosine_precision@1 0.4853
cosine_precision@3 0.456
cosine_precision@5 0.3518
cosine_precision@10 0.2133
cosine_recall@1 0.1788
cosine_recall@3 0.4624
cosine_recall@5 0.5791
cosine_recall@10 0.6971
cosine_ndcg@10 0.5963
cosine_mrr@10 0.5358
cosine_map@100 0.5799

Information Retrieval

Metric Value
cosine_accuracy@1 0.4142
cosine_accuracy@3 0.4683
cosine_accuracy@5 0.5502
cosine_accuracy@10 0.6476
cosine_precision@1 0.4142
cosine_precision@3 0.3962
cosine_precision@5 0.3116
cosine_precision@10 0.1944
cosine_recall@1 0.1495
cosine_recall@3 0.3967
cosine_recall@5 0.5081
cosine_recall@10 0.6319
cosine_ndcg@10 0.5275
cosine_mrr@10 0.4655
cosine_map@100 0.5099

Information Retrieval

Metric Value
cosine_accuracy@1 0.2952
cosine_accuracy@3 0.3416
cosine_accuracy@5 0.4142
cosine_accuracy@10 0.493
cosine_precision@1 0.2952
cosine_precision@3 0.2849
cosine_precision@5 0.2291
cosine_precision@10 0.1485
cosine_recall@1 0.1081
cosine_recall@3 0.2885
cosine_recall@5 0.3776
cosine_recall@10 0.4825
cosine_ndcg@10 0.3937
cosine_mrr@10 0.3396
cosine_map@100 0.385

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 5,822 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 26 tokens
    • mean: 96.36 tokens
    • max: 170 tokens
    • min: 8 tokens
    • mean: 16.47 tokens
    • max: 32 tokens
  • Samples:
    positive anchor
    the same time they would if they were assigned the original requests.” Id. at 11–12.
    The plaintiff responds by focusing on the factual underpinnings of the CIA’s policy
    arguments—in particular the CIA’s contentions about “undue burden.” See Pl.’s 443 Cross-Mot.
    Mem. at 2–7. For example, the plaintiff points out that the CIA waives FOIA fees “‘as an act of
    What is one argument the plaintiff critiques regarding the CIA's policy?
    contends that, “[i]n order to be properly withheld [under Exemption 2], the information must be
    of a relatively trivial nature.” Id. (citing Dep’t of Air Force v. Rose, 425 U.S. 352, 369–70
    (1976) and Lesar v. DOJ, 636 F.2d 472, 485 (D.C. Cir. 1980)). This triviality requirement
    applies, according to plaintiff, because the rationale for Exemption 2 is “that the very task of
    What does the plaintiff assert as the rationale for Exemption 2?
    the shooting.2 The video was 1 minute and 51 seconds long.
    Before admission of the video, Mr. Zimmerman testified that, in the months prior
    to the shooting, he had suspected Mr. Mooney of sleeping with his girlfriend, but Mr.
    Mooney had denied the allegation. Mr. Zimmerman testified that, on the night of the
    What did Mr. Mooney do in response to the allegation?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: False
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: False
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.8791 10 5.4341 - - - - -
1.0 12 - 0.5894 0.5880 0.5425 0.4581 0.3261
1.7033 20 2.535 - - - - -
2.0 24 - 0.6310 0.6275 0.5876 0.5039 0.3711
2.5275 30 1.854 - - - - -
3.0 36 - 0.6456 0.6400 0.5952 0.5206 0.3938
3.3516 40 1.7104 - - - - -
4.0 48 - 0.6492 0.6441 0.5963 0.5275 0.3937
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.54.0
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.9.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
4
Safetensors
Model size
149M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iamkpi/modernbert-embed-base-legal-matryoshka-2

Finetuned
(63)
this model

Evaluation results