SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("jaimevera1107/all-MiniLM-L12-v2-pubmed")
# Run inference
sentences = [
    "What were the findings of the study on cyclic 3',5'-nucleotide phosphodiesterase in bovine thyroid regarding its activity and the factors influencing it?",
    "The study investigated the properties of cyclic 3',5'-nucleotide phosphodiesterase in bovine thyroid, revealing that its activity is stimulated by Mg2+ and requires a Ca2+-dependent activating factor, with distinct enzyme forms and kinetic behaviors observed.",
    '[The kinetics of lithium in the rat serum, brain and liver]. The kinetics of lithium in the serum, liver and brain of rats is described. The serum levels resembled those of man, whereas considerable quantitative differences were observed when comparing specific kinetic parameters. The brain level increased with the increasing doses, approaching the corresponding serum level. Concentration differences between different brain areas could be observed only after repeated administrations. Striatum, cortex and hippocampus showed significantly higher levels than the thalamus. The liver content remained low with increasing doses, and was below the brain level.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 67,560 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 7 tokens
    • mean: 74.75 tokens
    • max: 256 tokens
    • min: 6 tokens
    • mean: 59.33 tokens
    • max: 256 tokens
    • min: 0.0
    • mean: 0.68
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    What were the outcomes for asthmatic patients in a trial of indoramin regarding airflow improvement and migraine frequency reduction? Long-term trial of an alpha adrenoceptor blocking drug (Indoramin) in asthma. A preliminary report. Eight patients suffering from both asthma and migraine underwent a clinical trial for 3 months of indoramin, an alpha adrenoceptor antagonist with antihistamine and antiserotonin activity. Patients were told indoramin was prescribed for migraine prophylaxis. In three asthmatic patients there was a marked increase in airflow meter (AFM) readings which were recorded daily, the remaining five showing no significant change or a decrease in AFM readings. Indoramin did not appear to potentiate the action of the beta sympathomimetic aerosols. It is suggested that a small population of asthmatic patients may derive therapeutic benefit from an alpha adrenoceptor antagonist. Seven of the eight patients experienced a 50% decrease in the frequency of their migraine headaches. 0.5
    The ontogeny of L-alpha-hydroxyacid oxidase isozymes in the mouse. Mouse liver hydroxyacid oxidase isozymes are present at low levels at birth and increase in activity until day 13 after which HAOX-B almost disappears and HZOS-A is reduced to approximately one half the maximum level in the adultkidney. HAOX-G appears near day 13 post partum and increases until adult levels are reached, the female having four times the activity of the male. The pregnant female has significantly lower levels of HAOX-A and HAOX-B in the liver and higher activity of HAOX-B in the kidney. Developmental changes occur in the extent of epigenetic modification of mouse liver HAOX-A during the early neonatal period. In mice, liver hydroxyacid oxidase isozymes show developmental changes, with HAOX-B decreasing after day 13 and HAOX-G increasing, while pregnant females exhibit lower HAOX-A and HAOX-B levels in the liver but higher HAOX-B activity in the kidney. 1.0
    What were the findings regarding renal inflammation and leptospires in a study of striped skunks from Louisiana? In a study of 100 striped skunks from Louisiana, 50% exhibited renal inflammation, and 10% with severe lesions showed azotemia, while leptospires were cultured from 30% of the skunks. 0.5
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 4
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.2367 500 0.0659
0.4735 1000 0.042
0.7102 1500 0.0351
0.9470 2000 0.0328
1.1837 2500 0.0291
1.4205 3000 0.0269
1.6572 3500 0.0265
1.8939 4000 0.026
2.1307 4500 0.0245
2.3674 5000 0.0231
2.6042 5500 0.0219
2.8409 6000 0.0229
3.0777 6500 0.0227
3.3144 7000 0.0206
3.5511 7500 0.02
3.7879 8000 0.0201

Framework Versions

  • Python: 3.11.9
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.3
  • PyTorch: 2.7.0+cu118
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
4
Safetensors
Model size
22.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jaimevera1107/all-MiniLM-L12-v2-pubmed

Finetuned
(404)
this model