SentenceTransformer based on distilbert/distilbert-base-uncased

This is a sentence-transformers model finetuned from distilbert/distilbert-base-uncased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: distilbert/distilbert-base-uncased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("pjbhaumik/biencoder-finetune-model-v9")
# Run inference
sentences = [
    'pets in cargo',
    'can a pet travel in cargo',
    'baggage exceptions for Amex',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine nan
spearman_cosine nan
pearson_manhattan nan
spearman_manhattan nan
pearson_euclidean nan
spearman_euclidean nan
pearson_dot nan
spearman_dot nan
pearson_max nan
spearman_max nan

Training Details

Training Dataset

Unnamed Dataset

  • Size: 15,488 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string int
    details
    • min: 4 tokens
    • mean: 10.4 tokens
    • max: 47 tokens
    • min: 4 tokens
    • mean: 10.14 tokens
    • max: 37 tokens
    • 1: 100.00%
  • Samples:
    sentence_0 sentence_1 label
    how to use a companion certificate on delta.com SHOPPING ON DELTA.COM FOR AMEX CERT 1
    is jamaica can be booked with companion certificate what areas can the American Express companion certificate be applied to 1
    how do i book award travel on klm can you book an air france ticket with miles 1
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 12
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 12
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss eval_examples_spearman_max
0.1033 100 - nan
0.2066 200 - nan
0.3099 300 - nan
0.4132 400 - nan
0.5165 500 0.7655 nan
0.6198 600 - nan
0.7231 700 - nan
0.8264 800 - nan
0.9298 900 - nan
1.0 968 - nan
1.0331 1000 0.3727 nan
1.1364 1100 - nan
1.2397 1200 - nan
1.3430 1300 - nan
1.4463 1400 - nan
1.5496 1500 0.2686 nan
1.6529 1600 - nan
1.7562 1700 - nan
1.8595 1800 - nan
1.9628 1900 - nan
2.0 1936 - nan
2.0661 2000 0.2709 nan
2.1694 2100 - nan
2.2727 2200 - nan
2.3760 2300 - nan
2.4793 2400 - nan
2.5826 2500 0.231 nan
2.6860 2600 - nan
2.7893 2700 - nan
2.8926 2800 - nan
2.9959 2900 - nan
3.0 2904 - nan
3.0992 3000 0.2461 nan
3.2025 3100 - nan
3.3058 3200 - nan
3.4091 3300 - nan
3.5124 3400 - nan
3.6157 3500 0.2181 nan
3.7190 3600 - nan
3.8223 3700 - nan
3.9256 3800 - nan
4.0 3872 - nan
4.0289 3900 - nan
4.1322 4000 0.2288 nan
4.2355 4100 - nan
4.3388 4200 - nan
4.4421 4300 - nan
4.5455 4400 - nan
4.6488 4500 0.2123 nan
4.7521 4600 - nan
4.8554 4700 - nan
4.9587 4800 - nan
5.0 4840 - nan
5.0620 4900 - nan
5.1653 5000 0.2254 nan
5.2686 5100 - nan
5.3719 5200 - nan
5.4752 5300 - nan
5.5785 5400 - nan
5.6818 5500 0.2077 nan
5.7851 5600 - nan
5.8884 5700 - nan
5.9917 5800 - nan
6.0 5808 - nan
6.0950 5900 - nan
6.1983 6000 0.218 nan
6.3017 6100 - nan
6.4050 6200 - nan
6.5083 6300 - nan
6.6116 6400 - nan
6.7149 6500 0.206 nan
6.8182 6600 - nan
6.9215 6700 - nan
7.0 6776 - nan
7.0248 6800 - nan
7.1281 6900 - nan
7.2314 7000 0.2126 nan
7.3347 7100 - nan
7.4380 7200 - nan
7.5413 7300 - nan
7.6446 7400 - nan
7.7479 7500 0.2065 nan
7.8512 7600 - nan
7.9545 7700 - nan
8.0 7744 - nan
8.0579 7800 - nan
8.1612 7900 - nan
8.2645 8000 0.2068 nan
8.3678 8100 - nan
8.4711 8200 - nan
8.5744 8300 - nan
8.6777 8400 - nan
8.7810 8500 0.2014 nan
8.8843 8600 - nan
8.9876 8700 - nan
9.0 8712 - nan
9.0909 8800 - nan
9.1942 8900 - nan
9.2975 9000 0.2057 nan
9.4008 9100 - nan
9.5041 9200 - nan
9.6074 9300 - nan
9.7107 9400 - nan
9.8140 9500 0.1969 nan
9.9174 9600 - nan
10.0 9680 - nan
10.0207 9700 - nan
10.1240 9800 - nan
10.2273 9900 - nan
10.3306 10000 0.2023 nan
10.4339 10100 - nan
10.5372 10200 - nan
10.6405 10300 - nan
10.7438 10400 - nan
10.8471 10500 0.1946 nan
10.9504 10600 - nan
11.0 10648 - nan
11.0537 10700 - nan
11.1570 10800 - nan
11.2603 10900 - nan
11.3636 11000 0.1982 nan
11.4669 11100 - nan
11.5702 11200 - nan
11.6736 11300 - nan
11.7769 11400 - nan
11.8802 11500 0.1919 nan
11.9835 11600 - nan
12.0 11616 - nan

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.2
  • PyTorch: 2.1.0
  • Accelerate: 0.30.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
17
Safetensors
Model size
66.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pjbhaumik/biencoder-finetune-model-v9

Finetuned
(7238)
this model

Evaluation results