CrossEncoder based on sentence-transformers/all-MiniLM-L6-v2

This is a Cross Encoder model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("skfrost19/reranker-msmarco-v1.1-Lion-all-MiniLM-L6-v2-bce")
# Get scores for pairs of texts
pairs = [
    ['what is a ladyfinger', 'A light, delicate sponge cake roughly shaped like a large, fat finger. Used as an accompaniment to ice cream, puddings and other desserts. One of the oldest and most delicate of sponge cakes, from the House of Savoy in eleventh century France. American ladyfingers are smaller and moister than their Italian counterparts.'],
    ['what is itp blood disorder', "Immune thrombocytopenia (THROM-bo-si-toe-PE-ne-ah), or ITP, is a bleeding disorder. In ITP, the blood doesn't clot as it should. This is due to a low number of blood cell fragments called platelets (PLATE-lets) or thrombocytes (THROM-bo-sites). Platelets are made in your bone marrow along with other kinds of blood cells."],
    ['what is spark programming', 'Spark 1.5.1 works with Java 7 and higher. If you are using Java 8, Spark supports lambda expressions for concisely writing functions, otherwise you can use the classes in the org.apache.spark.api.java.function package. To write a Spark application in Java, you need to add a dependency on Spark. The first thing a Spark program must do is to create a JavaSparkContext object, which tells Spark how to access a cluster. To create a SparkContext you first need to build a SparkConf object that contains information about your application.'],
    ['cost to replace a driveway', "The average national cost of a driveway installation is $3,647, with most homeowners spending between $2,026 and $5,278. This data is based on actual project costs as reported by HomeAdvisor members. If you're thinking about installing a driveway, it's important to consider a couple of different things. "],
    ['is quaker oatmeal gluten-free', 'Gluten-Free Confidence Score: 5/10. While pure oats themselves are technically gluten-free, cross contamination can occur while growing, storing, and processing of oats. It is because of this that Quaker Oats cannot guarantee that their oats in their oatmeal are truly gluten-free.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'what is a ladyfinger',
    [
        'A light, delicate sponge cake roughly shaped like a large, fat finger. Used as an accompaniment to ice cream, puddings and other desserts. One of the oldest and most delicate of sponge cakes, from the House of Savoy in eleventh century France. American ladyfingers are smaller and moister than their Italian counterparts.',
        "Immune thrombocytopenia (THROM-bo-si-toe-PE-ne-ah), or ITP, is a bleeding disorder. In ITP, the blood doesn't clot as it should. This is due to a low number of blood cell fragments called platelets (PLATE-lets) or thrombocytes (THROM-bo-sites). Platelets are made in your bone marrow along with other kinds of blood cells.",
        'Spark 1.5.1 works with Java 7 and higher. If you are using Java 8, Spark supports lambda expressions for concisely writing functions, otherwise you can use the classes in the org.apache.spark.api.java.function package. To write a Spark application in Java, you need to add a dependency on Spark. The first thing a Spark program must do is to create a JavaSparkContext object, which tells Spark how to access a cluster. To create a SparkContext you first need to build a SparkConf object that contains information about your application.',
        "The average national cost of a driveway installation is $3,647, with most homeowners spending between $2,026 and $5,278. This data is based on actual project costs as reported by HomeAdvisor members. If you're thinking about installing a driveway, it's important to consider a couple of different things. ",
        'Gluten-Free Confidence Score: 5/10. While pure oats themselves are technically gluten-free, cross contamination can occur while growing, storing, and processing of oats. It is because of this that Quaker Oats cannot guarantee that their oats in their oatmeal are truly gluten-free.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

  • Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100 and NanoNQ_R100
  • Evaluated with CrossEncoderRerankingEvaluator with these parameters:
    {
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric NanoMSMARCO_R100 NanoNFCorpus_R100 NanoNQ_R100
map 0.0637 (-0.4259) 0.2746 (+0.0136) 0.0348 (-0.3848)
mrr@10 0.0361 (-0.4414) 0.2927 (-0.2071) 0.0111 (-0.4156)
ndcg@10 0.0507 (-0.4898) 0.2176 (-0.1074) 0.0300 (-0.4706)

Cross Encoder Nano BEIR

  • Dataset: NanoBEIR_R100_mean
  • Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nfcorpus",
            "nq"
        ],
        "rerank_k": 100,
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric Value
map 0.1244 (-0.2657)
mrr@10 0.1133 (-0.3547)
ndcg@10 0.0994 (-0.3559)

Training Details

Training Dataset

ms_marco

  • Dataset: ms_marco at a47ee7a
  • Size: 666,193 training samples
  • Columns: query, passage, and label
  • Approximate statistics based on the first 1000 samples:
    query passage label
    type string string int
    details
    • min: 8 characters
    • mean: 33.72 characters
    • max: 101 characters
    • min: 121 characters
    • mean: 417.75 characters
    • max: 843 characters
    • 0: ~86.60%
    • 1: ~13.40%
  • Samples:
    query passage label
    what is signifying Signifyin' directs attention to the connotative, context-bound significance of words, which is accessible only to those who share the cultural values of a given speech community. While Signifyin(g) is the term coined by Henry Louis Gates, Jr. to represent a black vernacular, the idea stems from the thoughts of Ferdinand De Saussure and the process of signifying--the association between words and the ideas they indicate.. 0
    definition of scaled scale noun (SIZE). B2 [S or U] the ​size or ​level of something, ​especially when this is ​large: We don't ​yet ​know the scale of the ​problem. Nuclear ​weapons ​cause ​destruction on a ​massive scale (= ​cause a lot of ​destruction). scale noun. › [S or U] the ​size or ​level of something, especially when this is large: the scale of sth We ​failed to ​recognize the scale of the problem. on a large/​small scale. on a ​global/​national/​international scale. 0
    what is planibel glass Product Details. Planibel A is the new high performance LOW-E hard coating specially developed to reach the high ranking within the WER scale (Window Energy Rating). 0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": null
    }
    

Evaluation Dataset

ms_marco

  • Dataset: ms_marco at a47ee7a
  • Size: 10,000 evaluation samples
  • Columns: query, passage, and label
  • Approximate statistics based on the first 1000 samples:
    query passage label
    type string string int
    details
    • min: 12 characters
    • mean: 34.24 characters
    • max: 99 characters
    • min: 79 characters
    • mean: 421.13 characters
    • max: 1061 characters
    • 0: ~88.60%
    • 1: ~11.40%
  • Samples:
    query passage label
    what is a ladyfinger A light, delicate sponge cake roughly shaped like a large, fat finger. Used as an accompaniment to ice cream, puddings and other desserts. One of the oldest and most delicate of sponge cakes, from the House of Savoy in eleventh century France. American ladyfingers are smaller and moister than their Italian counterparts. 0
    what is itp blood disorder Immune thrombocytopenia (THROM-bo-si-toe-PE-ne-ah), or ITP, is a bleeding disorder. In ITP, the blood doesn't clot as it should. This is due to a low number of blood cell fragments called platelets (PLATE-lets) or thrombocytes (THROM-bo-sites). Platelets are made in your bone marrow along with other kinds of blood cells. 0
    what is spark programming Spark 1.5.1 works with Java 7 and higher. If you are using Java 8, Spark supports lambda expressions for concisely writing functions, otherwise you can use the classes in the org.apache.spark.api.java.function package. To write a Spark application in Java, you need to add a dependency on Spark. The first thing a Spark program must do is to create a JavaSparkContext object, which tells Spark how to access a cluster. To create a SparkContext you first need to build a SparkConf object that contains information about your application. 0
  • Loss: BinaryCrossEntropyLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "pos_weight": null
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 512
  • per_device_eval_batch_size: 512
  • learning_rate: 2e-08
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 512
  • per_device_eval_batch_size: 512
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-08
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss NanoMSMARCO_R100_ndcg@10 NanoNFCorpus_R100_ndcg@10 NanoNQ_R100_ndcg@10 NanoBEIR_R100_mean_ndcg@10
-1 -1 - - 0.0281 (-0.5123) 0.2158 (-0.1092) 0.0350 (-0.4656) 0.0930 (-0.3624)
0.0008 1 0.6816 - - - - -
0.7680 1000 0.6802 - - - - -
1.0 1302 - 0.6756 0.0324 (-0.5081) 0.2171 (-0.1079) 0.0330 (-0.4677) 0.0941 (-0.3612)
1.5361 2000 0.6752 - - - - -
2.0 2604 - 0.6710 0.0507 (-0.4898) 0.2166 (-0.1084) 0.0184 (-0.4823) 0.0952 (-0.3602)
2.3041 3000 0.6718 - - - - -
3.0 3906 - 0.6693 0.0507 (-0.4898) 0.2176 (-0.1074) 0.0300 (-0.4706) 0.0994 (-0.3559)
-1 -1 - - 0.0507 (-0.4898) 0.2176 (-0.1074) 0.0300 (-0.4706) 0.0994 (-0.3559)
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.5
  • Sentence Transformers: 4.0.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.6.0
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
7
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for skfrost19/reranker-msmarco-v1.1-Lion-all-MiniLM-L6-v2-bce

Finetuned
(404)
this model

Dataset used to train skfrost19/reranker-msmarco-v1.1-Lion-all-MiniLM-L6-v2-bce

Evaluation results