SentenceTransformer based on sentence-transformers/stsb-distilbert-base

This is a sentence-transformers model finetuned from sentence-transformers/stsb-distilbert-base on the quora-duplicates dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/stsb-distilbert-base
Maximum Sequence Length: 128 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- quora-duplicates
Language: en

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("CalebR84/stsb-distilbert-base-ocl")
# Run inference
sentences = [
    'How can I lose weight quickly? Need serious help.',
    'How can you lose weight really quick?',
    'Why are there so many half-built, abandoned buildings in Mexico?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Dataset: quora-duplicates
Evaluated with BinaryClassificationEvaluator

Metric	Value
cosine_accuracy	0.866
cosine_accuracy_threshold	0.786
cosine_f1	0.8321
cosine_f1_threshold	0.7849
cosine_precision	0.7812
cosine_recall	0.8901
cosine_ap	0.8773
cosine_mcc	0.7256

Paraphrase Mining

Dataset: quora-duplicates-dev

Evaluated with ParaphraseMiningEvaluator with these parameters:

{'add_transitive_closure': <function ParaphraseMiningEvaluator.add_transitive_closure at 0x00000219B2FE09A0>, 'max_pairs': 500000, 'top_k': 100}

Metric	Value
average_precision	0.6393
f1	0.6435
precision	0.6447
recall	0.6424
threshold	0.8727

Information Retrieval

Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.9172
cosine_accuracy@3	0.9588
cosine_accuracy@5	0.9672
cosine_accuracy@10	0.9762
cosine_precision@1	0.9172
cosine_precision@3	0.4102
cosine_precision@5	0.2644
cosine_precision@10	0.1406
cosine_recall@1	0.7869
cosine_recall@3	0.9198
cosine_recall@5	0.9442
cosine_recall@10	0.9641
cosine_ndcg@10	0.9388
cosine_mrr@10	0.9393
cosine_map@100	0.9258

Training Details

Training Dataset

quora-duplicates

Dataset: quora-duplicates at 451a485
Size: 100,000 training samples
Columns: sentence1, sentence2, and label
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label
type string string int
details
min: 6 tokens
mean: 15.56 tokens
max: 62 tokens

min: 6 tokens
mean: 15.73 tokens
max: 84 tokens

0: ~63.20%
1: ~36.80%

	sentence1	sentence2	label
type	string	string	int
details	min: 6 tokens mean: 15.56 tokens max: 62 tokens	min: 6 tokens mean: 15.73 tokens max: 84 tokens	0: ~63.20% 1: ~36.80%

Samples:

sentence1	sentence2	label
`What are some of the greatest books not adapted into film yet?`	`What book should be made into a movie?`	`0`
`How can I increase my communication skills?`	`How we improve our communication skills?`	`1`
`Heymen I have a note5 it give me this message when a turn it on and shout down (custom pinary are blocked by frp lock) I try odin and kies butnot work?`	`Setup dubbing studio with very less budget in India?`	`0`

Loss: OnlineContrastiveLoss

Evaluation Dataset

quora-duplicates

Dataset: quora-duplicates at 451a485
Size: 1,000 evaluation samples
Columns: sentence1, sentence2, and label
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label
type string string int
details
min: 3 tokens
mean: 15.37 tokens
max: 62 tokens

min: 6 tokens
mean: 15.63 tokens
max: 78 tokens

0: ~62.70%
1: ~37.30%

	sentence1	sentence2	label
type	string	string	int
details	min: 3 tokens mean: 15.37 tokens max: 62 tokens	min: 6 tokens mean: 15.63 tokens max: 78 tokens	0: ~62.70% 1: ~37.30%

Samples:

sentence1	sentence2	label
`Which is the best book to learn data structures and algorithms?`	`Which book is the best book for algorithm and datastructure?`	`1`
`Does modafinil shows up on a drug test? Because my urine smells a lot of medicine?`	`Can Modafinil come out in a drug test?`	`0`
`Does the size of a penis matter?`	`Does penis size matters for girls?`	`1`

Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
num_train_epochs: 10
warmup_ratio: 0.1
fp16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 10
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	Validation Loss	quora-duplicates_cosine_ap	quora-duplicates-dev_average_precision	cosine_ndcg@10
0	0	-	-	0.6905	0.4200	0.9397
0.0640	100	2.6402	-	-	-	-
0.1280	200	2.4398	-	-	-	-
0.1599	250	-	2.4217	0.7392	0.4765	0.9426
0.1919	300	2.2461	-	-	-	-
0.2559	400	2.1433	-	-	-	-
0.3199	500	2.0417	2.1120	0.7970	0.4566	0.9429
0.3839	600	2.0441	-	-	-	-
0.4479	700	1.8907	-	-	-	-
0.4798	750	-	2.0011	0.8229	0.4820	0.9468
0.5118	800	1.8985	-	-	-	-
0.5758	900	1.7521	-	-	-	-
0.6398	1000	1.8888	1.8010	0.8382	0.4925	0.9425
0.7038	1100	1.8524	-	-	-	-
0.7678	1200	1.6956	-	-	-	-
0.7997	1250	-	1.8004	0.8438	0.4283	0.9336
0.8317	1300	1.7519	-	-	-	-
0.8957	1400	1.7515	-	-	-	-
0.9597	1500	1.7288	1.7434	0.8352	0.5050	0.9428
1.0237	1600	1.533	-	-	-	-
1.0877	1700	1.2543	-	-	-	-
1.1196	1750	-	1.7109	0.8514	0.5299	0.9415
1.1516	1800	1.3201	-	-	-	-
1.2156	1900	1.3309	-	-	-	-
1.2796	2000	1.3256	1.7111	0.8528	0.5138	0.9393
1.3436	2100	1.2865	-	-	-	-
1.4075	2200	1.2659	-	-	-	-
1.4395	2250	-	1.7974	0.8468	0.5320	0.9390
1.4715	2300	1.2601	-	-	-	-
1.5355	2400	1.3337	-	-	-	-
1.5995	2500	1.3319	1.6922	0.8575	0.5399	0.9416
1.6635	2600	1.3232	-	-	-	-
1.7274	2700	1.3684	-	-	-	-
1.7594	2750	-	1.5772	0.8581	0.5592	0.9484
1.7914	2800	1.2706	-	-	-	-
1.8554	2900	1.3186	-	-	-	-
1.9194	3000	1.2336	1.5423	0.8656	0.5749	0.9433
1.9834	3100	1.2193	-	-	-	-
2.0473	3200	0.868	-	-	-	-
2.0793	3250	-	1.6575	0.8632	0.5735	0.9395
2.1113	3300	0.6411	-	-	-	-
2.1753	3400	0.7127	-	-	-	-
2.2393	3500	0.7044	1.5778	0.8718	0.5823	0.9387
2.3033	3600	0.6299	-	-	-	-
2.3672	3700	0.7162	-	-	-	-
2.3992	3750	-	1.6300	0.8595	0.5936	0.9414
2.4312	3800	0.6642	-	-	-	-
2.4952	3900	0.6902	-	-	-	-
2.5592	4000	0.7959	1.6070	0.8637	0.6006	0.9363
2.6232	4100	0.7588	-	-	-	-
2.6871	4200	0.6925	-	-	-	-
2.7191	4250	-	1.6787	0.8682	0.6006	0.9411
2.7511	4300	0.7226	-	-	-	-
2.8151	4400	0.7507	-	-	-	-
2.8791	4500	0.7563	1.6040	0.8658	0.6061	0.9416
2.9431	4600	0.7737	-	-	-	-
3.0070	4700	0.6525	-	-	-	-
3.0390	4750	-	1.6782	0.8652	0.5983	0.9401
3.0710	4800	0.3831	-	-	-	-
3.1350	4900	0.297	-	-	-	-
3.1990	5000	0.3725	1.7229	0.8588	0.6175	0.9418
3.2630	5100	0.4142	-	-	-	-
3.3269	5200	0.4415	-	-	-	-
3.3589	5250	-	1.6564	0.8635	0.6026	0.9379
3.3909	5300	0.3729	-	-	-	-
3.4549	5400	0.4164	-	-	-	-
3.5189	5500	0.3668	1.5964	0.8677	0.6105	0.9358
3.5829	5600	0.4184	-	-	-	-
3.6468	5700	0.4311	-	-	-	-
3.6788	5750	-	1.6523	0.8680	0.6130	0.9365
3.7108	5800	0.4222	-	-	-	-
3.7748	5900	0.4302	-	-	-	-
3.8388	6000	0.428	1.6625	0.8674	0.6163	0.9370
3.9028	6100	0.3898	-	-	-	-
3.9667	6200	0.4255	-	-	-	-
3.9987	6250	-	1.6145	0.8680	0.6118	0.9347
4.0307	6300	0.3456	-	-	-	-
4.0947	6400	0.2265	-	-	-	-
4.1587	6500	0.1913	1.7208	0.8595	0.6339	0.9433
4.2226	6600	0.2258	-	-	-	-
4.2866	6700	0.2484	-	-	-	-
4.3186	6750	-	1.6286	0.8600	0.6313	0.9394
4.3506	6800	0.1977	-	-	-	-
4.4146	6900	0.2013	-	-	-	-
4.4786	7000	0.2351	1.6910	0.8651	0.6193	0.9401
4.5425	7100	0.2356	-	-	-	-
4.6065	7200	0.2542	-	-	-	-
4.6385	7250	-	1.6955	0.8643	0.6129	0.9357
4.6705	7300	0.2592	-	-	-	-
4.7345	7400	0.2585	-	-	-	-
4.7985	7500	0.2375	1.7593	0.8647	0.6143	0.9325
4.8624	7600	0.2506	-	-	-	-
4.9264	7700	0.2394	-	-	-	-
4.9584	7750	-	1.6051	0.8720	0.6213	0.9350
4.9904	7800	0.2374	-	-	-	-
5.0544	7900	0.1675	-	-	-	-
5.1184	8000	0.131	1.5864	0.8673	0.6201	0.9377
5.1823	8100	0.1308	-	-	-	-
5.2463	8200	0.1483	-	-	-	-
5.2783	8250	-	1.5976	0.8698	0.6136	0.9359
5.3103	8300	0.1413	-	-	-	-
5.3743	8400	0.1392	-	-	-	-
5.4383	8500	0.1464	1.5980	0.8661	0.6267	0.9346
5.5022	8600	0.1781	-	-	-	-
5.5662	8700	0.151	-	-	-	-
5.5982	8750	-	1.5343	0.8756	0.6245	0.9352
5.6302	8800	0.1568	-	-	-	-
5.6942	8900	0.1702	-	-	-	-
5.7582	9000	0.1362	1.7121	0.8675	0.6230	0.9362
5.8221	9100	0.1371	-	-	-	-
5.8861	9200	0.1381	-	-	-	-
5.9181	9250	-	1.6326	0.8671	0.6122	0.9302
5.9501	9300	0.1691	-	-	-	-
6.0141	9400	0.1701	-	-	-	-
6.0781	9500	0.0935	1.5705	0.8709	0.6066	0.9293
6.1420	9600	0.0852	-	-	-	-
6.2060	9700	0.0874	-	-	-	-
6.2380	9750	-	1.5643	0.8724	0.6061	0.9307
6.2700	9800	0.0889	-	-	-	-
6.3340	9900	0.0972	-	-	-	-
6.3980	10000	0.1011	1.5622	0.8736	0.6153	0.9328
6.4619	10100	0.0962	-	-	-	-
6.5259	10200	0.1259	-	-	-	-
6.5579	10250	-	1.5406	0.8687	0.6293	0.9373
6.5899	10300	0.0925	-	-	-	-
6.6539	10400	0.1138	-	-	-	-
6.7179	10500	0.0788	1.5450	0.8658	0.6226	0.9349
6.7818	10600	0.1112	-	-	-	-
6.8458	10700	0.0922	-	-	-	-
6.8778	10750	-	1.5063	0.8736	0.6245	0.9370
6.9098	10800	0.1173	-	-	-	-
6.9738	10900	0.1141	-	-	-	-
7.0377	11000	0.0637	1.5007	0.8741	0.6270	0.9379
7.1017	11100	0.0713	-	-	-	-
7.1657	11200	0.0754	-	-	-	-
7.1977	11250	-	1.5081	0.8725	0.6273	0.9376
7.2297	11300	0.04	-	-	-	-
7.2937	11400	0.0695	-	-	-	-
7.3576	11500	0.034	1.5598	0.8710	0.6179	0.9350
7.4216	11600	0.0513	-	-	-	-
7.4856	11700	0.0749	-	-	-	-
7.5176	11750	-	1.6118	0.8694	0.6264	0.9380
7.5496	11800	0.0708	-	-	-	-
7.6136	11900	0.0939	-	-	-	-
7.6775	12000	0.059	1.6282	0.8708	0.6271	0.9354
7.7415	12100	0.0847	-	-	-	-
7.8055	12200	0.0521	-	-	-	-
7.8375	12250	-	1.5478	0.8683	0.6359	0.9388
7.8695	12300	0.0394	-	-	-	-
7.9335	12400	0.0619	-	-	-	-
7.9974	12500	0.0593	1.5440	0.8771	0.6387	0.9393
8.0614	12600	0.0292	-	-	-	-
8.1254	12700	0.0267	-	-	-	-
8.1574	12750	-	1.5419	0.8773	0.6290	0.9388
8.1894	12800	0.0334	-	-	-	-
8.2534	12900	0.05	-	-	-	-
8.3173	13000	0.0439	1.5589	0.8740	0.6322	0.9384
8.3813	13100	0.0409	-	-	-	-
8.4453	13200	0.03	-	-	-	-
8.4773	13250	-	1.5472	0.8730	0.6347	0.9398
8.5093	13300	0.0373	-	-	-	-
8.5733	13400	0.0404	-	-	-	-
8.6372	13500	0.0357	1.5332	0.8749	0.6327	0.9404
8.7012	13600	0.023	-	-	-	-
8.7652	13700	0.0256	-	-	-	-
8.7972	13750	-	1.5154	0.8781	0.6337	0.9379
8.8292	13800	0.0563	-	-	-	-
8.8932	13900	0.029	-	-	-	-
8.9571	14000	0.0395	1.5503	0.8771	0.6344	0.9390
9.0211	14100	0.0296	-	-	-	-
9.0851	14200	0.0308	-	-	-	-
9.1171	14250	-	1.5385	0.8771	0.6363	0.9391
9.1491	14300	0.035	-	-	-	-
9.2131	14400	0.0217	-	-	-	-
9.2770	14500	0.0192	1.5592	0.8777	0.6373	0.9393
9.3410	14600	0.0369	-	-	-	-
9.4050	14700	0.0186	-	-	-	-
9.4370	14750	-	1.5626	0.8771	0.6368	0.9389
9.4690	14800	0.0303	-	-	-	-
9.5329	14900	0.0181	-	-	-	-
9.5969	15000	0.0217	1.5466	0.8782	0.6387	0.9390
9.6609	15100	0.0463	-	-	-	-
9.7249	15200	0.0211	-	-	-	-
9.7569	15250	-	1.5440	0.8772	0.6401	0.9395
9.7889	15300	0.0216	-	-	-	-
9.8528	15400	0.0328	-	-	-	-
9.9168	15500	0.0154	1.5399	0.8773	0.6393	0.9388
9.9808	15600	0.0263	-	-	-	-

Framework Versions

Python: 3.12.9
Sentence Transformers: 4.1.0
Transformers: 4.51.3
PyTorch: 2.7.0+cu126
Accelerate: 1.7.0
Datasets: 3.6.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CalebR84
/

stsb-distilbert-base-ocl

SentenceTransformer based on sentence-transformers/stsb-distilbert-base

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Binary Classification

Paraphrase Mining

Information Retrieval

Training Details

Training Dataset

quora-duplicates

Evaluation Dataset

quora-duplicates

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

Model tree for CalebR84/stsb-distilbert-base-ocl

Dataset used to train CalebR84/stsb-distilbert-base-ocl

Evaluation results