ModernBERT-base trained on GooAQ

This is a Cross Encoder model finetuned from nreimers/MiniLM-L6-H384-uncased using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

  • Model Type: Cross Encoder
  • Base model: nreimers/MiniLM-L6-H384-uncased
  • Maximum Sequence Length: 512 tokens
  • Number of Output Labels: 1 label
  • Language: en
  • License: apache-2.0

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the ๐Ÿค— Hub
model = CrossEncoder("ayushexel/reranker-MiniLM-L6-H384-uncased-gooaq-bce-495000")
# Get scores for pairs of texts
pairs = [
    ["in grey's anatomy how does izzie die?", 'After speculation that Izzie would be killed off in the fifth season, the character was diagnosed with Stage 4 metastatic melanoma.'],
    ["in grey's anatomy how does izzie die?", "Izzie later admitted to George that she was in love with him, leaving him speechless. George later admitted he loved Izzie too, despite his strange reaction to her when she confessed her love to him. Their relationship was soon discovered by George's wife, Callie and the two got a divorce."],
    ["in grey's anatomy how does izzie die?", "The episode in which Derek Shepherd (Patrick Dempsey) dies is one that most Grey's Anatomy fans will never forget. The fateful incident occurred in season 11, episode 21, and it was titled, โ€œHow To Save a Life.โ€ The attending doctor who failed to save McDreamy's life recently appeared in an episode of Grey's Anatomy."],
    ["in grey's anatomy how does izzie die?", "Richard Webber, Grey's Anatomy fans are nervous he'll die, though nothing is set in stone on the show yet. Warning: Spoilers for Season 16, Episode 19 of Grey's Anatomy follow."],
    ["in grey's anatomy how does izzie die?", "Izzie eventually forgives him, and they begin dating again until Denny enters the picture. After Denny's death they begin dating yet again and following her recovery from cancer they get married, but it doesn't last."],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    "in grey's anatomy how does izzie die?",
    [
        'After speculation that Izzie would be killed off in the fifth season, the character was diagnosed with Stage 4 metastatic melanoma.',
        "Izzie later admitted to George that she was in love with him, leaving him speechless. George later admitted he loved Izzie too, despite his strange reaction to her when she confessed her love to him. Their relationship was soon discovered by George's wife, Callie and the two got a divorce.",
        "The episode in which Derek Shepherd (Patrick Dempsey) dies is one that most Grey's Anatomy fans will never forget. The fateful incident occurred in season 11, episode 21, and it was titled, โ€œHow To Save a Life.โ€ The attending doctor who failed to save McDreamy's life recently appeared in an episode of Grey's Anatomy.",
        "Richard Webber, Grey's Anatomy fans are nervous he'll die, though nothing is set in stone on the show yet. Warning: Spoilers for Season 16, Episode 19 of Grey's Anatomy follow.",
        "Izzie eventually forgives him, and they begin dating again until Denny enters the picture. After Denny's death they begin dating yet again and following her recovery from cancer they get married, but it doesn't last.",
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Metric Value
map 0.5291 (+0.1486)
mrr@10 0.5258 (+0.1553)
ndcg@10 0.5805 (+0.1477)

Cross Encoder Reranking

  • Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100 and NanoNQ_R100
  • Evaluated with CrossEncoderRerankingEvaluator with these parameters:
    {
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric NanoMSMARCO_R100 NanoNFCorpus_R100 NanoNQ_R100
map 0.2939 (-0.1956) 0.3242 (+0.0632) 0.2769 (-0.1427)
mrr@10 0.2772 (-0.2003) 0.5253 (+0.0255) 0.2629 (-0.1638)
ndcg@10 0.3678 (-0.1726) 0.3345 (+0.0095) 0.3325 (-0.1682)

Cross Encoder Nano BEIR

  • Dataset: NanoBEIR_R100_mean
  • Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nfcorpus",
            "nq"
        ],
        "rerank_k": 100,
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric Value
map 0.2984 (-0.0917)
mrr@10 0.3552 (-0.1128)
ndcg@10 0.3449 (-0.1104)

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,749,365 training samples
  • Columns: question, answer, and label
  • Approximate statistics based on the first 1000 samples:
    question answer label
    type string string int
    details
    • min: 19 characters
    • mean: 42.17 characters
    • max: 79 characters
    • min: 54 characters
    • mean: 246.01 characters
    • max: 399 characters
    • 0: ~81.90%
    • 1: ~18.10%
  • Samples:
    question answer label
    in grey's anatomy how does izzie die? After speculation that Izzie would be killed off in the fifth season, the character was diagnosed with Stage 4 metastatic melanoma. 1
    in grey's anatomy how does izzie die? Izzie later admitted to George that she was in love with him, leaving him speechless. George later admitted he loved Izzie too, despite his strange reaction to her when she confessed her love to him. Their relationship was soon discovered by George's wife, Callie and the two got a divorce. 0
    in grey's anatomy how does izzie die? The episode in which Derek Shepherd (Patrick Dempsey) dies is one that most Grey's Anatomy fans will never forget. The fateful incident occurred in season 11, episode 21, and it was titled, โ€œHow To Save a Life.โ€ The attending doctor who failed to save McDreamy's life recently appeared in an episode of Grey's Anatomy. 0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • dataloader_num_workers: 12
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 12
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss gooaq-dev_ndcg@10 NanoMSMARCO_R100_ndcg@10 NanoNFCorpus_R100_ndcg@10 NanoNQ_R100_ndcg@10 NanoBEIR_R100_mean_ndcg@10
-1 -1 - 0.1141 (-0.3187) 0.0667 (-0.4737) 0.2984 (-0.0267) 0.0318 (-0.4689) 0.1323 (-0.3231)
0.0001 1 1.2808 - - - - -
0.0186 200 1.196 - - - - -
0.0372 400 1.1939 - - - - -
0.0559 600 1.1823 - - - - -
0.0745 800 1.1506 - - - - -
0.0931 1000 0.9972 - - - - -
0.1117 1200 0.9336 - - - - -
0.1304 1400 0.898 - - - - -
0.1490 1600 0.8582 - - - - -
0.1676 1800 0.8391 - - - - -
0.1862 2000 0.8153 - - - - -
0.2048 2200 0.7999 - - - - -
0.2235 2400 0.7793 - - - - -
0.2421 2600 0.7889 - - - - -
0.2607 2800 0.7576 - - - - -
0.2793 3000 0.7592 - - - - -
0.2980 3200 0.7543 - - - - -
0.3166 3400 0.7437 - - - - -
0.3352 3600 0.7426 - - - - -
0.3538 3800 0.7337 - - - - -
0.3724 4000 0.7312 - - - - -
0.3911 4200 0.7212 - - - - -
0.4097 4400 0.7281 - - - - -
0.4283 4600 0.7166 - - - - -
0.4469 4800 0.7167 - - - - -
0.4655 5000 0.7175 - - - - -
0.4842 5200 0.7176 - - - - -
0.5028 5400 0.7141 - - - - -
0.5214 5600 0.6963 - - - - -
0.5400 5800 0.6888 - - - - -
0.5587 6000 0.6937 - - - - -
0.5773 6200 0.7009 - - - - -
0.5959 6400 0.6887 - - - - -
0.6145 6600 0.6933 - - - - -
0.6331 6800 0.692 - - - - -
0.6518 7000 0.6874 - - - - -
0.6704 7200 0.6792 - - - - -
0.6890 7400 0.6772 - - - - -
0.7076 7600 0.6804 - - - - -
0.7263 7800 0.6728 - - - - -
0.7449 8000 0.6703 - - - - -
0.7635 8200 0.6844 - - - - -
0.7821 8400 0.6663 - - - - -
0.8007 8600 0.6775 - - - - -
0.8194 8800 0.6647 - - - - -
0.8380 9000 0.6818 - - - - -
0.8566 9200 0.6724 - - - - -
0.8752 9400 0.6748 - - - - -
0.8939 9600 0.6567 - - - - -
0.9125 9800 0.6682 - - - - -
0.9311 10000 0.6747 - - - - -
0.9497 10200 0.6618 - - - - -
0.9683 10400 0.6625 - - - - -
0.9870 10600 0.6629 - - - - -
-1 -1 - 0.5805 (+0.1477) 0.3678 (-0.1726) 0.3345 (+0.0095) 0.3325 (-0.1682) 0.3449 (-0.1104)

Framework Versions

  • Python: 3.11.0
  • Sentence Transformers: 4.0.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
5
Safetensors
Model size
22.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ayushexel/reranker-MiniLM-L6-H384-uncased-gooaq-bce-495000

Finetuned
(14)
this model

Evaluation results