SentenceTransformer based on sentence-transformers/stsb-distilbert-base

This is a sentence-transformers model finetuned from sentence-transformers/stsb-distilbert-base on the quora-duplicates dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("CalebR84/stsb-distilbert-base-ocl")
# Run inference
sentences = [
    'How can I lose weight quickly? Need serious help.',
    'How can you lose weight really quick?',
    'Why are there so many half-built, abandoned buildings in Mexico?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.866
cosine_accuracy_threshold 0.786
cosine_f1 0.8321
cosine_f1_threshold 0.7849
cosine_precision 0.7812
cosine_recall 0.8901
cosine_ap 0.8773
cosine_mcc 0.7256

Paraphrase Mining

  • Dataset: quora-duplicates-dev
  • Evaluated with ParaphraseMiningEvaluator with these parameters:
    {'add_transitive_closure': <function ParaphraseMiningEvaluator.add_transitive_closure at 0x00000219B2FE09A0>, 'max_pairs': 500000, 'top_k': 100}
    
Metric Value
average_precision 0.6393
f1 0.6435
precision 0.6447
recall 0.6424
threshold 0.8727

Information Retrieval

Metric Value
cosine_accuracy@1 0.9172
cosine_accuracy@3 0.9588
cosine_accuracy@5 0.9672
cosine_accuracy@10 0.9762
cosine_precision@1 0.9172
cosine_precision@3 0.4102
cosine_precision@5 0.2644
cosine_precision@10 0.1406
cosine_recall@1 0.7869
cosine_recall@3 0.9198
cosine_recall@5 0.9442
cosine_recall@10 0.9641
cosine_ndcg@10 0.9388
cosine_mrr@10 0.9393
cosine_map@100 0.9258

Training Details

Training Dataset

quora-duplicates

  • Dataset: quora-duplicates at 451a485
  • Size: 100,000 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 6 tokens
    • mean: 15.56 tokens
    • max: 62 tokens
    • min: 6 tokens
    • mean: 15.73 tokens
    • max: 84 tokens
    • 0: ~63.20%
    • 1: ~36.80%
  • Samples:
    sentence1 sentence2 label
    What are some of the greatest books not adapted into film yet? What book should be made into a movie? 0
    How can I increase my communication skills? How we improve our communication skills? 1
    Heymen I have a note5 it give me this message when a turn it on and shout down (custom pinary are blocked by frp lock) I try odin and kies butnot work? Setup dubbing studio with very less budget in India? 0
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

quora-duplicates

  • Dataset: quora-duplicates at 451a485
  • Size: 1,000 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 3 tokens
    • mean: 15.37 tokens
    • max: 62 tokens
    • min: 6 tokens
    • mean: 15.63 tokens
    • max: 78 tokens
    • 0: ~62.70%
    • 1: ~37.30%
  • Samples:
    sentence1 sentence2 label
    Which is the best book to learn data structures and algorithms? Which book is the best book for algorithm and datastructure? 1
    Does modafinil shows up on a drug test? Because my urine smells a lot of medicine? Can Modafinil come out in a drug test? 0
    Does the size of a penis matter? Does penis size matters for girls? 1
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss quora-duplicates_cosine_ap quora-duplicates-dev_average_precision cosine_ndcg@10
0 0 - - 0.6905 0.4200 0.9397
0.0640 100 2.6402 - - - -
0.1280 200 2.4398 - - - -
0.1599 250 - 2.4217 0.7392 0.4765 0.9426
0.1919 300 2.2461 - - - -
0.2559 400 2.1433 - - - -
0.3199 500 2.0417 2.1120 0.7970 0.4566 0.9429
0.3839 600 2.0441 - - - -
0.4479 700 1.8907 - - - -
0.4798 750 - 2.0011 0.8229 0.4820 0.9468
0.5118 800 1.8985 - - - -
0.5758 900 1.7521 - - - -
0.6398 1000 1.8888 1.8010 0.8382 0.4925 0.9425
0.7038 1100 1.8524 - - - -
0.7678 1200 1.6956 - - - -
0.7997 1250 - 1.8004 0.8438 0.4283 0.9336
0.8317 1300 1.7519 - - - -
0.8957 1400 1.7515 - - - -
0.9597 1500 1.7288 1.7434 0.8352 0.5050 0.9428
1.0237 1600 1.533 - - - -
1.0877 1700 1.2543 - - - -
1.1196 1750 - 1.7109 0.8514 0.5299 0.9415
1.1516 1800 1.3201 - - - -
1.2156 1900 1.3309 - - - -
1.2796 2000 1.3256 1.7111 0.8528 0.5138 0.9393
1.3436 2100 1.2865 - - - -
1.4075 2200 1.2659 - - - -
1.4395 2250 - 1.7974 0.8468 0.5320 0.9390
1.4715 2300 1.2601 - - - -
1.5355 2400 1.3337 - - - -
1.5995 2500 1.3319 1.6922 0.8575 0.5399 0.9416
1.6635 2600 1.3232 - - - -
1.7274 2700 1.3684 - - - -
1.7594 2750 - 1.5772 0.8581 0.5592 0.9484
1.7914 2800 1.2706 - - - -
1.8554 2900 1.3186 - - - -
1.9194 3000 1.2336 1.5423 0.8656 0.5749 0.9433
1.9834 3100 1.2193 - - - -
2.0473 3200 0.868 - - - -
2.0793 3250 - 1.6575 0.8632 0.5735 0.9395
2.1113 3300 0.6411 - - - -
2.1753 3400 0.7127 - - - -
2.2393 3500 0.7044 1.5778 0.8718 0.5823 0.9387
2.3033 3600 0.6299 - - - -
2.3672 3700 0.7162 - - - -
2.3992 3750 - 1.6300 0.8595 0.5936 0.9414
2.4312 3800 0.6642 - - - -
2.4952 3900 0.6902 - - - -
2.5592 4000 0.7959 1.6070 0.8637 0.6006 0.9363
2.6232 4100 0.7588 - - - -
2.6871 4200 0.6925 - - - -
2.7191 4250 - 1.6787 0.8682 0.6006 0.9411
2.7511 4300 0.7226 - - - -
2.8151 4400 0.7507 - - - -
2.8791 4500 0.7563 1.6040 0.8658 0.6061 0.9416
2.9431 4600 0.7737 - - - -
3.0070 4700 0.6525 - - - -
3.0390 4750 - 1.6782 0.8652 0.5983 0.9401
3.0710 4800 0.3831 - - - -
3.1350 4900 0.297 - - - -
3.1990 5000 0.3725 1.7229 0.8588 0.6175 0.9418
3.2630 5100 0.4142 - - - -
3.3269 5200 0.4415 - - - -
3.3589 5250 - 1.6564 0.8635 0.6026 0.9379
3.3909 5300 0.3729 - - - -
3.4549 5400 0.4164 - - - -
3.5189 5500 0.3668 1.5964 0.8677 0.6105 0.9358
3.5829 5600 0.4184 - - - -
3.6468 5700 0.4311 - - - -
3.6788 5750 - 1.6523 0.8680 0.6130 0.9365
3.7108 5800 0.4222 - - - -
3.7748 5900 0.4302 - - - -
3.8388 6000 0.428 1.6625 0.8674 0.6163 0.9370
3.9028 6100 0.3898 - - - -
3.9667 6200 0.4255 - - - -
3.9987 6250 - 1.6145 0.8680 0.6118 0.9347
4.0307 6300 0.3456 - - - -
4.0947 6400 0.2265 - - - -
4.1587 6500 0.1913 1.7208 0.8595 0.6339 0.9433
4.2226 6600 0.2258 - - - -
4.2866 6700 0.2484 - - - -
4.3186 6750 - 1.6286 0.8600 0.6313 0.9394
4.3506 6800 0.1977 - - - -
4.4146 6900 0.2013 - - - -
4.4786 7000 0.2351 1.6910 0.8651 0.6193 0.9401
4.5425 7100 0.2356 - - - -
4.6065 7200 0.2542 - - - -
4.6385 7250 - 1.6955 0.8643 0.6129 0.9357
4.6705 7300 0.2592 - - - -
4.7345 7400 0.2585 - - - -
4.7985 7500 0.2375 1.7593 0.8647 0.6143 0.9325
4.8624 7600 0.2506 - - - -
4.9264 7700 0.2394 - - - -
4.9584 7750 - 1.6051 0.8720 0.6213 0.9350
4.9904 7800 0.2374 - - - -
5.0544 7900 0.1675 - - - -
5.1184 8000 0.131 1.5864 0.8673 0.6201 0.9377
5.1823 8100 0.1308 - - - -
5.2463 8200 0.1483 - - - -
5.2783 8250 - 1.5976 0.8698 0.6136 0.9359
5.3103 8300 0.1413 - - - -
5.3743 8400 0.1392 - - - -
5.4383 8500 0.1464 1.5980 0.8661 0.6267 0.9346
5.5022 8600 0.1781 - - - -
5.5662 8700 0.151 - - - -
5.5982 8750 - 1.5343 0.8756 0.6245 0.9352
5.6302 8800 0.1568 - - - -
5.6942 8900 0.1702 - - - -
5.7582 9000 0.1362 1.7121 0.8675 0.6230 0.9362
5.8221 9100 0.1371 - - - -
5.8861 9200 0.1381 - - - -
5.9181 9250 - 1.6326 0.8671 0.6122 0.9302
5.9501 9300 0.1691 - - - -
6.0141 9400 0.1701 - - - -
6.0781 9500 0.0935 1.5705 0.8709 0.6066 0.9293
6.1420 9600 0.0852 - - - -
6.2060 9700 0.0874 - - - -
6.2380 9750 - 1.5643 0.8724 0.6061 0.9307
6.2700 9800 0.0889 - - - -
6.3340 9900 0.0972 - - - -
6.3980 10000 0.1011 1.5622 0.8736 0.6153 0.9328
6.4619 10100 0.0962 - - - -
6.5259 10200 0.1259 - - - -
6.5579 10250 - 1.5406 0.8687 0.6293 0.9373
6.5899 10300 0.0925 - - - -
6.6539 10400 0.1138 - - - -
6.7179 10500 0.0788 1.5450 0.8658 0.6226 0.9349
6.7818 10600 0.1112 - - - -
6.8458 10700 0.0922 - - - -
6.8778 10750 - 1.5063 0.8736 0.6245 0.9370
6.9098 10800 0.1173 - - - -
6.9738 10900 0.1141 - - - -
7.0377 11000 0.0637 1.5007 0.8741 0.6270 0.9379
7.1017 11100 0.0713 - - - -
7.1657 11200 0.0754 - - - -
7.1977 11250 - 1.5081 0.8725 0.6273 0.9376
7.2297 11300 0.04 - - - -
7.2937 11400 0.0695 - - - -
7.3576 11500 0.034 1.5598 0.8710 0.6179 0.9350
7.4216 11600 0.0513 - - - -
7.4856 11700 0.0749 - - - -
7.5176 11750 - 1.6118 0.8694 0.6264 0.9380
7.5496 11800 0.0708 - - - -
7.6136 11900 0.0939 - - - -
7.6775 12000 0.059 1.6282 0.8708 0.6271 0.9354
7.7415 12100 0.0847 - - - -
7.8055 12200 0.0521 - - - -
7.8375 12250 - 1.5478 0.8683 0.6359 0.9388
7.8695 12300 0.0394 - - - -
7.9335 12400 0.0619 - - - -
7.9974 12500 0.0593 1.5440 0.8771 0.6387 0.9393
8.0614 12600 0.0292 - - - -
8.1254 12700 0.0267 - - - -
8.1574 12750 - 1.5419 0.8773 0.6290 0.9388
8.1894 12800 0.0334 - - - -
8.2534 12900 0.05 - - - -
8.3173 13000 0.0439 1.5589 0.8740 0.6322 0.9384
8.3813 13100 0.0409 - - - -
8.4453 13200 0.03 - - - -
8.4773 13250 - 1.5472 0.8730 0.6347 0.9398
8.5093 13300 0.0373 - - - -
8.5733 13400 0.0404 - - - -
8.6372 13500 0.0357 1.5332 0.8749 0.6327 0.9404
8.7012 13600 0.023 - - - -
8.7652 13700 0.0256 - - - -
8.7972 13750 - 1.5154 0.8781 0.6337 0.9379
8.8292 13800 0.0563 - - - -
8.8932 13900 0.029 - - - -
8.9571 14000 0.0395 1.5503 0.8771 0.6344 0.9390
9.0211 14100 0.0296 - - - -
9.0851 14200 0.0308 - - - -
9.1171 14250 - 1.5385 0.8771 0.6363 0.9391
9.1491 14300 0.035 - - - -
9.2131 14400 0.0217 - - - -
9.2770 14500 0.0192 1.5592 0.8777 0.6373 0.9393
9.3410 14600 0.0369 - - - -
9.4050 14700 0.0186 - - - -
9.4370 14750 - 1.5626 0.8771 0.6368 0.9389
9.4690 14800 0.0303 - - - -
9.5329 14900 0.0181 - - - -
9.5969 15000 0.0217 1.5466 0.8782 0.6387 0.9390
9.6609 15100 0.0463 - - - -
9.7249 15200 0.0211 - - - -
9.7569 15250 - 1.5440 0.8772 0.6401 0.9395
9.7889 15300 0.0216 - - - -
9.8528 15400 0.0328 - - - -
9.9168 15500 0.0154 1.5399 0.8773 0.6393 0.9388
9.9808 15600 0.0263 - - - -

Framework Versions

  • Python: 3.12.9
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.7.0+cu126
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
7
Safetensors
Model size
66.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CalebR84/stsb-distilbert-base-ocl

Finetuned
(9)
this model

Dataset used to train CalebR84/stsb-distilbert-base-ocl

Evaluation results