SentenceTransformer based on BAAI/bge-base-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("GaniduA/bge-finetuned-olscience")
# Run inference
sentences = [
    'Discuss the principles and process of electrolysis, including the conventions adopted in electrolysis.',
    'The development of artificial intelligence has significantly impacted the tech industry, leading to advancements in machine learning and natural language processing.',
    "In the movie 'Inception', directed by Christopher Nolan, the plot revolves around a skilled thief who is given a chance at redemption if he can successfully perform inception by planting an idea into someone's subconscious.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 1.0
cosine_accuracy_threshold 0.0571
cosine_f1 1.0
cosine_f1_threshold 0.0571
cosine_precision 1.0
cosine_recall 1.0
cosine_ap 1.0
cosine_mcc 1.0

Training Details

Training Dataset

Unnamed Dataset

  • Size: 34,969 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 6 tokens
    • mean: 17.43 tokens
    • max: 209 tokens
    • min: 3 tokens
    • mean: 25.94 tokens
    • max: 335 tokens
    • min: 0.0
    • mean: 0.25
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    How does the reaction of zinc with copper sulfate demonstrate a single displacement reaction? Julius Caesar crossed the Rubicon River in 49 BC, which led to a chain of events culminating in the Roman Civil War. 0.0
    How do you investigate the effect of tightening a screw on the moment of force required to rotate a stick? Explore the depths of the ocean with a team of deep-sea divers searching for mythical sea creatures and undiscovered shipwrecks. 0.0
    Describe the operation of a photodiode in optical sensing. A photodiode converts light into an electrical current by generating electron-hole pairs when exposed to light, used in optical sensing and communication applications. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 2
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss eval_cosine_ap
0.0366 20 - 0.9892
0.0731 40 - 0.9978
0.1097 60 - 0.9989
0.1463 80 - 0.9997
0.1828 100 - 0.9999
0.2194 120 - 0.9998
0.2559 140 - 0.9998
0.2925 160 - 0.9998
0.3291 180 - 0.9998
0.3656 200 - 0.9999
0.4022 220 - 0.9998
0.4388 240 - 0.9999
0.4753 260 - 1.0000
0.5119 280 - 1.0000
0.5484 300 - 1.0000
0.5850 320 - 1.0000
0.6216 340 - 1.0000
0.6581 360 - 1.0000
0.6947 380 - 1.0
0.7313 400 - 1.0000
0.7678 420 - 1.0
0.8044 440 - 1.0
0.8410 460 - 1.0000
0.8775 480 - 1.0
0.9141 500 0.0199 1.0000
0.9506 520 - 1.0
0.9872 540 - 1.0000
1.0 547 - 1.0000
1.0238 560 - 1.0000
1.0603 580 - 1.0000
1.0969 600 - 1.0000
1.1335 620 - 1.0000
1.1700 640 - 1.0
1.2066 660 - 1.0000
1.2431 680 - 1.0000
1.2797 700 - 1.0000
1.3163 720 - 1.0000
1.3528 740 - 1.0000
1.3894 760 - 1.0
1.4260 780 - 1.0
1.4625 800 - 1.0000
1.4991 820 - 1.0
1.5356 840 - 1.0000
1.5722 860 - 1.0000
1.6088 880 - 1.0
1.6453 900 - 1.0
1.6819 920 - 1.0
1.7185 940 - 1.0000
1.7550 960 - 1.0000
1.7916 980 - 1.0000
1.8282 1000 0.0012 1.0000
1.8647 1020 - 1.0
1.9013 1040 - 1.0
1.9378 1060 - 1.0
1.9744 1080 - 1.0
2.0 1094 - 1.0

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
159
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GaniduA/bge-finetuned-olscience

Finetuned
(430)
this model

Evaluation results