tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:100231
- loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-small-en-v1.5
widget:
- source_sentence: >-
Represent this sentence for searching relevant passages: where do the
chances live on raising hope
sentences:
- >-
Raising Hope James "Jimmy" Chance is a 23-year old, living in the
surreal fictional town of Natesville, who impregnates a serial killer
during a one-night stand. Earning custody of his daughter, Hope, after
the mother is sentenced to death, Jimmy relies on his oddball but
well-intentioned family for support in raising the child.
- >-
Quadripoint A quadripoint is a point on the Earth that touches the
border of four distinct territories.[1][2] The term has never been in
common use—it may not have been used before 1964 when it was possibly
invented by the Office of the Geographer of the United States Department
of State.[3][n 1] The word does not appear in the Oxford English
Dictionary or Merriam-Webster Online dictionary, but it does appear in
the Encyclopædia Britannica,[4] as well as in the World Factbook
articles on Botswana, Namibia, Zambia, and Zimbabwe, dating as far back
as 1990.[5]
- >-
Show Me the Way to Go Home The song was recorded by several artists in
the 1920s, including radio personalities The Happiness Boys,[2] Vincent
Lopez and his Orchestra,[2] and the California Ramblers.[3] Throughout
the twentieth into the twenty-first century it has been recorded by
numerous artists.
- source_sentence: >-
Represent this sentence for searching relevant passages: who wrote the
book of john in the bible
sentences:
- >-
Gospel of John Although the Gospel of John is anonymous,[1] Christian
tradition historically has attributed it to John the Apostle, son of
Zebedee and one of Jesus' Twelve Apostles. The gospel is so closely
related in style and content to the three surviving Johannine epistles
that commentators treat the four books,[2] along with the Book of
Revelation, as a single corpus of Johannine literature, albeit not
necessarily written by the same author.[Notes 1]
- >-
Levi Strauss & Co. Levi Strauss & Co. /ˌliːvaɪ ˈstraʊs/ is a privately
held[5] American clothing company known worldwide for its Levi's
/ˌliːvaɪz/ brand of denim jeans. It was founded in May 1853[6] when
German immigrant Levi Strauss came from Buttenheim, Bavaria, to San
Francisco, California to open a west coast branch of his brothers' New
York dry goods business.[7] The company's corporate headquarters is
located in the Levi's Plaza in San Francisco.[8]
- >-
Saturday Night Fever Tony's friends come to the car along with an
intoxicated Annette. Joey says she has agreed to have sex with everyone.
Tony tries to lead her away, but is subdued by Double J and Joey, and
sullenly leaves with the group in the car. Double J and Joey rape
Annette. Bobby C. pulls the car over on the Verrazano-Narrows Bridge for
their usual cable-climbing antics. Instead of abstaining as usual, Bobby
performs stunts more recklessly than the rest of the gang. Realizing
that he is acting recklessly, Tony tries to get him to come down.
Bobby's strong sense of despair, the situation with Pauline, and Tony's
broken promise to call him earlier that day all lead to a suicidal
tirade about Tony's lack of caring before Bobby slips and falls to his
death in the water below.
- source_sentence: >-
Represent this sentence for searching relevant passages: what type of
habitat do sea turtles live in
sentences:
- >-
Turbidity Governments have set standards on the allowable turbidity in
drinking water. In the United States, systems that use conventional or
direct filtration methods turbidity cannot be higher than 1.0
nephelometric turbidity units (NTU) at the plant outlet and all samples
for turbidity must be less than or equal to 0.3 NTU for at least 95
percent of the samples in any month. Systems that use filtration other
than the conventional or direct filtration must follow state limits,
which must include turbidity at no time exceeding 5 NTU. Many drinking
water utilities strive to achieve levels as low as 0.1 NTU.[11] The
European standards for turbidity state that it must be no more than 4
NTU.[12] The World Health Organization, establishes that the turbidity
of drinking water should not be more than 5 NTU, and should ideally be
below 1 NTU.[13]
- >-
List of 1924 Winter Olympics medal winners Finnish speed skater Clas
Thunberg topped the medal count with five medals: three golds, one
silver, and one bronze. One of his competitors, Roald Larsen of Norway,
also won five medals, with two silver and three bronze medal-winning
performances.[3] The first gold medalist at these Games—and therefore
the first gold medalist in Winter Olympic history—was American speed
skater Charles Jewtraw. Only one medal change took place after the
Games: in the ski jump competition, a marking error deprived American
athlete Anders Haugen of a bronze medal. Haugen pursued an appeal to the
IOC many years after the fact; he was awarded the medal after a 1974
decision in his favor.[1]
- >-
Sea turtle Sea turtles are generally found in the waters over
continental shelves. During the first three to five years of life, sea
turtles spend most of their time in the pelagic zone floating in seaweed
mats. Green sea turtles in particular are often found in Sargassum mats,
in which they find shelter and food.[14] Once the sea turtle has reached
adulthood it moves closer to the shore.[15] Females will come ashore to
lay their eggs on sandy beaches during the nesting season.[16]
- source_sentence: >-
Represent this sentence for searching relevant passages: what triggers the
release of calcium from the sarcoplasmic reticulum
sentences:
- >-
Pretty Little Liars (season 7) The season consisted of 20 episodes, in
which ten episodes aired in the summer of 2016, with the remaining ten
episodes aired from April 2017.[2][3][4] The season's premiere aired on
June 21, 2016, on Freeform.[5] Production and filming began in the end
of March 2016, which was confirmed by showrunner I. Marlene King.[6] The
season premiere was written by I. Marlene King and directed by Ron
Lagomarsino.[7] King revealed the title of the premiere on Twitter on
March 17, 2016.[8] On August 29, 2016, it was confirmed that this would
be the final season of the series.[9]
- >-
Wentworth (TV series) A seventh season was commissioned in April 2018,
before the sixth-season premiere, with filming commencing the following
week and a premiere set for 2019.
- >-
Sarcoplasmic reticulum Calcium ion release from the SR, occurs in the
junctional SR/terminal cisternae through a ryanodine receptor (RyR) and
is known as a calcium spark.[10] There are three types of ryanodine
receptor, RyR1 (in skeletal muscle), RyR2 (in cardiac muscle) and RyR3
(in the brain).[11] Calcium release through ryanodine receptors in the
SR is triggered differently in different muscles. In cardiac and smooth
muscle an electrical impulse (action potential) triggers calcium ions to
enter the cell through an L-type calcium channel located in the cell
membrane (smooth muscle) or T-tubule membrane (cardiac muscle). These
calcium ions bind to and activate the RyR, producing a larger increase
in intracellular calcium. In skeletal muscle, however, the L-type
calcium channel is bound to the RyR. Therefore activation of the L-type
calcium channel, via an action potential, activates the RyR directly,
causing calcium release (see calcium sparks for more details).[12] Also,
caffeine (found in coffee) can bind to and stimulate RyR. Caffeine works
by making the RyR more sensitive to either the action potential
(skeletal muscle) or calcium (cardiac or smooth muscle) therefore
producing calcium sparks more often (this can result in increased heart
rate, which is why we feel more awake after coffee).[13]
- source_sentence: >-
Represent this sentence for searching relevant passages: what topic do all
scientific questions have in common
sentences:
- >-
Jane Wyatt Wyatt portrayed Amanda Grayson, Spock's mother and Ambassador
Sarek's (Mark Lenard) wife, in the 1967 episode "Journey to Babel" of
the original NBC series, Star Trek, and the 1986 film Star Trek IV: The
Voyage Home.[9] Wyatt was once quoted as saying her fan mail for these
two appearances in this role exceeded that of Lost Horizon. In 1969, she
made a guest appearance on Here Come the Brides, but did not have any
scenes with Mark Lenard, who was starring on the show as sawmill owner
Aaron Stemple.
- >-
Minnesota Vikings The Vikings played in Super Bowl XI, their third Super
Bowl (fourth overall) in four years, against the Oakland Raiders at the
Rose Bowl in Pasadena, California, on January 9, 1977. The Vikings,
however, lost 32–14.[1]
- >-
List of topics characterized as pseudoscience Criticism of
pseudoscience, generally by the scientific community or skeptical
organizations, involves critiques of the logical, methodological, or
rhetorical bases of the topic in question.[1] Though some of the listed
topics continue to be investigated scientifically, others were only
subject to scientific research in the past, and today are considered
refuted but resurrected in a pseudoscientific fashion. Other ideas
presented here are entirely non-scientific, but have in one way or
another infringed on scientific domains or practices.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on BAAI/bge-small-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-small-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Fine-Tuned BGE-Small Model for Q&A
This is a BAAI/bge-small-en-v1.5
model that has been fine-tuned for a specific Question & Answering task using the MultipleNegativesRankingLoss
in the sentence-transformers
library.
It has been trained on a private dataset of 100,000+ question-answer pairs. Its primary purpose is to be the retriever model in a Retrieval-Augmented Generation (RAG) system. It excels at mapping questions to the passages that contain their answers.
How to Use (Practical Inference Example)
The primary use case is to find the most relevant passage for a given query.
from sentence_transformers import SentenceTransformer, util
# Load the fine-tuned model from the Hub
model_id = "srinivasanAI/bge-small-my-qna-model" # Replace with your model ID
model = SentenceTransformer(model_id)
# The BGE model requires a specific instruction for retrieval queries
instruction = "Represent this sentence for searching relevant passages: "
# 1. Define your query and your potential answers (passages)
query = instruction + "What is the powerhouse of the cell?"
passages = [
"Mitochondria are organelles that act like a digestive system and are often called the powerhouse of the cell.",
"The cell wall is a rigid layer that provides structural support to plant cells.",
"The sun is a star at the center of the Solar System."
]
# 2. Encode the single query and the list of passages separately
query_embedding = model.encode(query)
passage_embeddings = model.encode(passages)
# 3. Calculate the similarity between the single query and all passages
similarities = util.cos_sim(query_embedding, passage_embeddings)
# 4. Print the results
print(f"Query: {query.replace(instruction, '')}\\n")
for i, passage in enumerate(passages):
print(f"Similarity: {similarities[0][i]:.4f} | Passage: {passage}")
Training Details
Training Dataset
Unnamed Dataset
- Size: 100,231 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 18 tokens
- mean: 19.69 tokens
- max: 31 tokens
- min: 16 tokens
- mean: 139.68 tokens
- max: 512 tokens
- Samples:
sentence_0 sentence_1 Represent this sentence for searching relevant passages: where did strangers prey at night take place
The Strangers: Prey at Night In a secluded trailer park in Salem, Arkansas, the three masked killers, The Walker family — Dollface, Pin Up Girl, and the Man in the Mask — arrive. Dollface kills a female occupant and then lies down in bed next to the woman's sleeping husband.
Represent this sentence for searching relevant passages: what is the average height of the highest peaks in the drakensberg mountain range
Drakensberg During the past 20 million years, further massive upliftment, especially in the East, has taken place in Southern Africa. As a result, most of the plateau lies above 1,000 m (3,300 ft) despite the extensive erosion. The plateau is tilted such that its highest point is in the east, and it slopes gently downwards towards the west and south. The elevation of the edge of the eastern escarpments is typically in excess of 2,000 m (6,600 ft). It reaches its highest point (over 3,000 m (9,800 ft)) where the escarpment forms part of the international border between Lesotho and the South African province of KwaZulu-Natal.[5][8]
Represent this sentence for searching relevant passages: name the two epics of india which are woven around with legends
Indian epic poetry Indian epic poetry is the epic poetry written in the Indian subcontinent, traditionally called Kavya (or Kāvya; Sanskrit: काव्य, IAST: kāvyá). The Ramayana and the Mahabharata, which were originally composed in Sanskrit and later translated into many other Indian languages, and The Five Great Epics of Tamil Literature and Sangam literature are some of the oldest surviving epic poems ever written.[1]
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 32per_device_eval_batch_size
: 32num_train_epochs
: 1multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 32per_device_eval_batch_size
: 32per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsehub_revision
: Nonegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseliger_kernel_config
: Noneeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.1596 | 500 | 0.0556 |
0.3192 | 1000 | 0.0245 |
0.4788 | 1500 | 0.0236 |
0.6384 | 2000 | 0.0179 |
0.7980 | 2500 | 0.0202 |
0.9575 | 3000 | 0.0184 |
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 4.1.0
- Transformers: 4.53.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.9.0
- Datasets: 4.0.0
- Tokenizers: 0.21.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}