SentenceTransformer based on jxm/cde-small-v2

This is a sentence-transformers model finetuned from jxm/cde-small-v2. It maps sentences & paragraphs to a None-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: jxm/cde-small-v2
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: None dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({}) with Transformer model: ContextualDocumentEmbeddingTransformer 
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("BlackBeenie/cde-small-v2-biencoder-msmarco")
# Run inference
sentences = [
    'when did jeepers creepers come out',
    'Jeepers Creepers Wiki. Creeper. Creeper is a fictional character and the main antagonist in the 2001 horror film Jeepers Creepers and its 2003 sequel Jeepers Creepers II. It is an ancient, mysterious demon who viciously feeds on the flesh and bones of many human beings for 23 days every 23rd spring.',
    ' Creep  is a song by the English alternative rock band Radiohead. Radiohead released Creep as their debut single in 1992, and it later appeared on their first album, Pablo Honey (1993). During its initial release, Creep was not a chart success.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 499,184 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 4 tokens
    • mean: 9.26 tokens
    • max: 29 tokens
    • min: 14 tokens
    • mean: 81.55 tokens
    • max: 203 tokens
    • min: 16 tokens
    • mean: 80.95 tokens
    • max: 231 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    what year did the sandy hook incident happen For Newtown, 2012 Sandy Hook Elementary School shooting is still painful. It's been three years since the terrible day Jimmy Greene’s 6-year-old daughter, Ana Grace Marquez, and 19 other children were murdered in the mass shooting at Sandy Hook Elementary School. But life without Ana, who loved to sing and dance from room to room, continues to be so hard that, in some ways, Dec. 14 is no tougher than any other day for Greene. Hook is a 1991 Steven Spielberg film starring Dustin Hoffman and Robin Williams. The film's storyline is based on the books written by Sir James Matthew Barrie in 1904 or 1905 and is the sequel to the first book.
    what kind of degree do you need to be a medical assistant? If you choose this path, here is what you need to do: 1 Have a high school diploma or GED. The minimum educational requirement for medical assistants is a high school diploma or equivalency degree. 2 Find a doctor who will provide training. Many colleges offer two-year associate's degrees or one-year certificate programs in different areas of medical office technology. Certificate areas include billing specialist, medical administrative assistant, and medical transcriptionist. Because of the complexity of medical jargon and operational procedures, many employers prefer these professionals to hold related two-year degrees or complete one-year training programs.
    what does usb cord do The Flash Player is required to see this video. The term USB stands for Universal Serial Bus. USB cable assemblies are some of the most popular cable types available, used mostly to connect computers to peripheral devices such as cameras, camcorders, printers, scanners, and more. Devices manufactured to the current USB Revision 3.0 specification are backward compatible with version 1.1. The USB 2.0 specification for a Full-Speed/High-Speed cable calls for four wires, two for data and two for power, and a braided outer shield. The USB 3.0 specification calls for a total of 10 wires plus a braided outer shield. Two wires are used for power.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.0321 500 0.9856
0.0641 1000 0.4499
0.0962 1500 0.3673
0.1282 2000 0.339
0.1603 2500 0.3118
0.1923 3000 0.2929
0.2244 3500 0.2886
0.2564 4000 0.2771
0.2885 4500 0.2762
0.3205 5000 0.2716
0.3526 5500 0.2585
0.3846 6000 0.2631
0.4167 6500 0.2458
0.4487 7000 0.2496
0.4808 7500 0.252
0.5128 8000 0.2399
0.5449 8500 0.2422
0.5769 9000 0.2461
0.6090 9500 0.2314
0.6410 10000 0.2331
0.6731 10500 0.2314
0.7051 11000 0.2302
0.7372 11500 0.235
0.7692 12000 0.2176
0.8013 12500 0.2201
0.8333 13000 0.2206
0.8654 13500 0.222
0.8974 14000 0.2136
0.9295 14500 0.2108
0.9615 15000 0.2102
0.9936 15500 0.2098
1.0256 16000 0.1209
1.0577 16500 0.099
1.0897 17000 0.0944
1.1218 17500 0.0955
1.1538 18000 0.0947
1.1859 18500 0.0953
1.2179 19000 0.0943
1.25 19500 0.0911
1.2821 20000 0.0964
1.3141 20500 0.0933
1.3462 21000 0.0956
1.3782 21500 0.0941
1.4103 22000 0.0903
1.4423 22500 0.0889
1.4744 23000 0.0919
1.5064 23500 0.0917
1.5385 24000 0.0956
1.5705 24500 0.0903
1.6026 25000 0.0931
1.6346 25500 0.0931
1.6667 26000 0.089
1.6987 26500 0.0892
1.7308 27000 0.091
1.7628 27500 0.0892
1.7949 28000 0.0884
1.8269 28500 0.0889
1.8590 29000 0.0877
1.8910 29500 0.0866
1.9231 30000 0.0853
1.9551 30500 0.085
1.9872 31000 0.0867
2.0192 31500 0.055
2.0513 32000 0.0338
2.0833 32500 0.033
2.1154 33000 0.033
2.1474 33500 0.0317
2.1795 34000 0.0323
2.2115 34500 0.0322
2.2436 35000 0.0316
2.2756 35500 0.0314
2.3077 36000 0.0312
2.3397 36500 0.0324
2.3718 37000 0.0324
2.4038 37500 0.0328
2.4359 38000 0.0311
2.4679 38500 0.0312
2.5 39000 0.0312
2.5321 39500 0.0311
2.5641 40000 0.0315
2.5962 40500 0.0308
2.6282 41000 0.0308
2.6603 41500 0.0306
2.6923 42000 0.0313
2.7244 42500 0.0322
2.7564 43000 0.0315
2.7885 43500 0.0311
2.8205 44000 0.0321
2.8526 44500 0.0318
2.8846 45000 0.0305
2.9167 45500 0.031
2.9487 46000 0.032
2.9808 46500 0.0306

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 3.4.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
6
Safetensors
Model size
306M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BlackBeenie/cde-small-v2-biencoder-msmarco

Finetuned
jxm/cde-small-v2
Finetuned
(3)
this model

Collection including BlackBeenie/cde-small-v2-biencoder-msmarco