yahyaabd's picture
Add new SentenceTransformer model
9e58e8b verified
metadata
language:
  - en
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:404290
  - loss:OnlineContrastiveLoss
base_model: sentence-transformers/stsb-distilbert-base
widget:
  - source_sentence: What does the lock symbol on my iPhone 6 means?
    sentences:
      - How did the Soviet Navy compare to the US Navy?
      - What does the iPhone icon with lock and arrow mean?
      - What is the importance of electrical engineering?
  - source_sentence: >-
      Why are blue and red neon lights illegal or restricted for commercial uses
      in Honduras?
    sentences:
      - >-
        Why are blue and red neon lights illegal or restricted for commercial
        uses in Colombia?
      - Why would I want a Raspberry Pi?
      - How do I see things as they are?
  - source_sentence: How will Hillary Clinton deal with russia?
    sentences:
      - >-
        What would have happened if Barty crouch Jr escaped the dementors and
        made it back to the graveyard?
      - How will Hillary Clinton deal with terrorism?
      - >-
        I am a commercial student who wishes to study accounting, but now I wish
        to study law. Is it possible?
  - source_sentence: What are the best managing skills?
    sentences:
      - What are the top skills of effective Product Managers?
      - How do I lose weight in a short time?
      - What are some good songs for lyrical dances?
  - source_sentence: What is the best fact checking sources that all Quorans will most trust?
    sentences:
      - Do people still write love letters?
      - >-
        Is working in McKinsey one of the best and surest ways to get into
        Harvard Business School?
      - What is the most memorable book that Quorans have read?
datasets:
  - sentence-transformers/quora-duplicates
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - cosine_mcc
  - average_precision
  - f1
  - precision
  - recall
  - threshold
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: SentenceTransformer based on sentence-transformers/stsb-distilbert-base
    results:
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: quora duplicates
          type: quora-duplicates
        metrics:
          - type: cosine_accuracy
            value: 0.869
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.813665509223938
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.8390243902439025
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.7617226243019104
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.7818181818181819
            name: Cosine Precision
          - type: cosine_recall
            value: 0.9052631578947369
            name: Cosine Recall
          - type: cosine_ap
            value: 0.8852756469769394
            name: Cosine Ap
          - type: cosine_mcc
            value: 0.7337941850587686
            name: Cosine Mcc
      - task:
          type: paraphrase-mining
          name: Paraphrase Mining
        dataset:
          name: quora duplicates dev
          type: quora-duplicates-dev
        metrics:
          - type: average_precision
            value: 0.5427423938771084
            name: Average Precision
          - type: f1
            value: 0.5532539228607665
            name: F1
          - type: precision
            value: 0.5508021390374331
            name: Precision
          - type: recall
            value: 0.5557276315132138
            name: Recall
          - type: threshold
            value: 0.865865558385849
            name: Threshold
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy@1
            value: 0.9298
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.9732
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.982
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9868
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.9298
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.4154
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.26792
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.1417
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.8009069531416296
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.9349178789609083
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9610774822138647
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9765400300287947
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.9525570390902354
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.9522342063492065
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.9400294978560327
            name: Cosine Map@100

SentenceTransformer based on sentence-transformers/stsb-distilbert-base

This is a sentence-transformers model finetuned from sentence-transformers/stsb-distilbert-base on the quora-duplicates dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("yahyaabd/stsb-distilbert-base-ocl")
# Run inference
sentences = [
    'What is the best fact checking sources that all Quorans will most trust?',
    'What is the most memorable book that Quorans have read?',
    'Is working in McKinsey one of the best and surest ways to get into Harvard Business School?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.869
cosine_accuracy_threshold 0.8137
cosine_f1 0.839
cosine_f1_threshold 0.7617
cosine_precision 0.7818
cosine_recall 0.9053
cosine_ap 0.8853
cosine_mcc 0.7338

Paraphrase Mining

Metric Value
average_precision 0.5427
f1 0.5533
precision 0.5508
recall 0.5557
threshold 0.8659

Information Retrieval

Metric Value
cosine_accuracy@1 0.9298
cosine_accuracy@3 0.9732
cosine_accuracy@5 0.982
cosine_accuracy@10 0.9868
cosine_precision@1 0.9298
cosine_precision@3 0.4154
cosine_precision@5 0.2679
cosine_precision@10 0.1417
cosine_recall@1 0.8009
cosine_recall@3 0.9349
cosine_recall@5 0.9611
cosine_recall@10 0.9765
cosine_ndcg@10 0.9526
cosine_mrr@10 0.9522
cosine_map@100 0.94

Training Details

Training Dataset

quora-duplicates

  • Dataset: quora-duplicates at 451a485
  • Size: 404,290 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 16.01 tokens
    • max: 67 tokens
    • min: 6 tokens
    • mean: 15.9 tokens
    • max: 72 tokens
    • 0: ~64.40%
    • 1: ~35.60%
  • Samples:
    sentence1 sentence2 label
    How much worse do things need to get before the "blue" states cut off welfare to the "red" states? If the red states and the blue states were separated into two countries, which country would be more successful? 0
    Can you offer me any advice on how to lose weight? What are the best ways to lose weight? What is the best diet plan? 1
    How do I break my knee? How do I break my elbow? 0
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

quora-duplicates

  • Dataset: quora-duplicates at 451a485
  • Size: 404,290 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 15.98 tokens
    • max: 53 tokens
    • min: 6 tokens
    • mean: 15.9 tokens
    • max: 77 tokens
    • 0: ~62.00%
    • 1: ~38.00%
  • Samples:
    sentence1 sentence2 label
    Which is the best SAP online training centre at Hyderabad? Which is the best sap workflow online training institute in Hyderabad? 1
    How did World War Two start? What will most likely cause World War III? 0
    How do I find a unique string from a given string in Java without methods such as split, contain, and divide? How can I split the string "[] {() <>} []" into " [,], {, (, ..." in Java? 0
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss quora-duplicates_cosine_ap quora-duplicates-dev_average_precision cosine_ndcg@10
0 0 - - 0.7402 0.4200 0.9413
0.0640 100 2.481 - - - -
0.1280 200 2.1466 - - - -
0.1599 250 - 1.7997 0.8327 0.4596 0.9355
0.1919 300 2.0354 - - - -
0.2559 400 1.9342 - - - -
0.3199 500 1.9132 1.6231 0.8617 0.4896 0.9425
0.3839 600 1.8015 - - - -
0.4479 700 1.7407 - - - -
0.4798 750 - 1.4953 0.8737 0.5112 0.9468
0.5118 800 1.6454 - - - -
0.5758 900 1.6568 - - - -
0.6398 1000 1.6811 1.4678 0.8751 0.5290 0.9457
0.7038 1100 1.711 - - - -
0.7678 1200 1.6449 - - - -
0.7997 1250 - 1.4363 0.8811 0.5327 0.9507
0.8317 1300 1.5921 - - - -
0.8957 1400 1.5062 - - - -
0.9597 1500 1.5728 1.4029 0.8853 0.5427 0.9526

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.4.0
  • Transformers: 4.48.1
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}