metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:100231
- loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
- source_sentence: >-
Represent this sentence for searching relevant passages: when did
university of georgia start playing football
sentences:
- >-
Georgia Bulldogs football The Georgia Bulldogs football program
represents the University of Georgia in the sport of American football.
The Bulldogs compete in the Football Bowl Subdivision (FBS) of the
National Collegiate Athletic Association (NCAA) and the Eastern Division
of the Southeastern Conference (SEC). They play their home games at
historic Sanford Stadium on the university's Athens, Georgia, campus.
Georgia's inaugural season was in 1892. UGA claims two consensus
national championships (1942 and 1980); the AP and Coaches Polls have
each voted the Bulldogs the national champion once (1980); Georgia has
also been named the National Champion by at least one polling authority
in three other seasons (1927, 1946 and 1968).[5] The Bulldogs have won
15 conference championships, including 13 SEC championships (tied for
2nd most all-time), and have appeared in 54 bowl games, tied for second
most all-time. The program has also produced two Heisman Trophy winners,
four No. 1 National Football League (NFL) draft picks, and many winners
of other national awards. The team is known for its storied history,
unique traditions, and rabid fan base. Georgia has won over 800 games in
their history, placing them 11th all time in wins.[6]
- >-
Octavio Dotel Octavio Eduardo Dotel Diaz (born November 25, 1973) is a
Dominican former professional baseball pitcher. Dotel played for
thirteen major league teams, more than any other player in the history
of Major League Baseball (MLB), setting the mark when he pitched for the
Detroit Tigers on April 7, 2012, breaking a record previously held by
Mike Morgan, Matt Stairs, and Ron Villone.[1] Edwin Jackson tied this
record in 2018.[2] He was a member of the Houston Astros for 5 seasons.
- >-
Who Wants to Be a Millionaire (U.S. game show) Over the course of the
programs history, 12 people have answered the final question correctly
and walked away with the top prize. These include:
- source_sentence: >-
Represent this sentence for searching relevant passages: when was video
killed the radio star released
sentences:
- >-
Video Killed the Radio Star "Video Killed the Radio Star" is a song
written by Trevor Horn, Geoff Downes and Bruce Woolley in 1978. It was
first recorded by Bruce Woolley and The Camera Club (with Thomas Dolby
on keyboards) for their album English Garden, and later by British group
the Buggles, consisting of Horn and Downes. The track was recorded and
mixed in 1979, released as their debut single on 7 September 1979 by
Island Records, and included on their first album The Age of Plastic.
The backing track was recorded at Virgin's Town House in West London,
and mixing and vocal recording would later take place at Sarm East
Studios.
- >-
Haley Joel Osment Haley Joel Osment (born April 10, 1988) is an American
actor. After a series of roles in television and film during the 1990s,
including a small part in Forrest Gump playing the title character's son
(also named Forrest Gump), Osment rose to fame for his performance as a
young unwilling medium in M. Night Shyamalan's thriller film The Sixth
Sense, which earned him a nomination for the Academy Award for Best
Supporting Actor. He subsequently appeared in leading roles in several
high-profile Hollywood films, including Steven Spielberg's A.I.
Artificial Intelligence and Mimi Leder's Pay It Forward.
- >-
How Come The song is about the relationship between the members of D12.
Eminem makes reference to his relationship to Proof, Kon Artis talks
about Eminem and Kim's relationship, and Proof talks about the rift
between him and Eminem.
- source_sentence: >-
Represent this sentence for searching relevant passages: when do the
badgers play their first football game
sentences:
- >-
The Song of Roland Charlemagne's army is fighting the Muslims in Spain.
They have been there for seven years, and the last city standing is
Saragossa, held by the Muslim King Marsile. Threatened by the might of
Charlemagne's army of Franks, Marsile seeks advice from his wise man,
Blancandrin, who councils him to conciliate the Emperor, offering to
surrender and giving hostages. Accordingly, Marsile sends out messengers
to Charlemagne, promising treasure and Marsile's conversion to
Christianity if the Franks will go back to France.
- >-
Tell Your Heart to Beat Again "Tell Your Heart to Beat Again" is a song
written by Bernie Herms, Randy Phillips, and Matthew West and originally
recorded by Contemporary Christian-Worship trio, Phillips, Craig and
Dean for their twelfth studio album, Breathe In (2012).[1] It was later
recorded by American singer Danny Gokey for his second studio album,
Hope in Front of Me (2014), with production by Herms. Gokey's rendition
was released to Christian radio January 8, 2016 through BMG-Chrysalis as
the album's third single.[2] The song reached number two on the
Billboard Christian Songs chart, earning Gokey his highest-charting
single to date.[3]
- >-
Wisconsin Badgers football The first Badger football team took the field
in 1889, losing the only two games it played that season. In 1890,
Wisconsin earned its first victory with a 106–0 drubbing of Whitewater
Normal School (now the University of Wisconsin–Whitewater), still the
most lopsided win in school history. However, the very next week the
Badgers suffered what remains their most lopsided defeat, a humiliating
63–0 loss at the hands of the University of Minnesota. Since then, the
Badgers and Gophers have met 122 times, making Wisconsin vs Minnesota
the most-played rivalry in the Football Bowl Subdivision.[5]
- source_sentence: >-
Represent this sentence for searching relevant passages: percentage of
babies born at 24 weeks that survive
sentences:
- >-
Industrial Revolution The precise start and end of the Industrial
Revolution is still debated among historians, as is the pace of economic
and social changes.[10][11][12][13] Eric Hobsbawm held that the
Industrial Revolution began in Britain in the 1780s and was not fully
felt until the 1830s or 1840s,[10] while T. S. Ashton held that it
occurred roughly between 1760 and 1830.[11] Rapid industrialization
first began in Britain, starting with mechanized spinning in the
1780s,[14] with high rates of growth in steam power and iron production
occurring after 1800. Mechanized textile production spread from Great
Britain to continental Europe and the United States in the early 19th
century, with important centres of textiles, iron and coal emerging in
Belgium and the United States and later textiles in France.[1]
- >-
The Shootist The film's expansive outdoor scenes were filmed on location
in Carson City. Bond Rogers' boarding house is the 1914 Krebs-Peterson
House, located in Carson City’s historic residential district. The buggy
ride was shot at Washoe Lake State Park, in the Washoe Valley, between
Reno and Carson City. Though it was a Paramount production, the street
scenes and most interior shots were filmed at the Warner Bros. backlot
and sound stages in Burbank, California.[8] The horse-drawn trolley was
an authentic one, once used as a shuttle between El Paso and Juarez,
Mexico.[9]
- "Fetal viability There is no sharp limit of development, gestational age, or weight at which a human fetus automatically becomes viable.[1] According to studies between 2003 and 2005, 20 to 35 percent of babies born at 23 weeks of gestation survive, while 50 to 70 percent of babies born at 24 to 25 weeks, and more than 90 percent born at 26 to 27 weeks, survive.[4] It is rare for a baby weighing less than 500\_g (17.6\_ounces) to survive.[1] A baby's chances for survival increase 3-4% per day between 23 and 24 weeks of gestation and about 2-3% per day between 24 and 26 weeks of gestation. After 26 weeks the rate of survival increases at a much slower rate because survival is high already.[5][6][7][8]"
- source_sentence: >-
Represent this sentence for searching relevant passages: when did bear
inthe big blue house come out
sentences:
- >-
Court of Appeals of the Philippines The Court of Appeals of the
Philippines (Filipino: Hukuman ng Apelasyon ng Pilipinas) is the
Philippines' second-highest judicial court, just after the Supreme
Court. The court consists of 69 Associate Justices and 1 Presiding
Justice. Under the Constitution, the Court of Appeals (CA) "reviews not
only the decisions and orders of the Regional Trial Courts nationwide
but also those of the Court of Tax Appeals, as well as the awards,
judgments, final orders or resolutions of, or authorized by 21
Quasi-Judicial Agencies exercising quasi-judicial functions mentioned in
Rule 43 of the 1997 Rules of Civil Procedure, plus the National Amnesty
Commission (Pres. Proclamation No. 347 of 1994) and Office of the
Ombudsman (Fabian v. Desierto, 295 SCRA 470). Under RA 9282, which
elevated the CTA to the same level of the CA, CTA en banc decisions are
now subject to review by the Supreme Court instead of the CA (as opposed
to what is currently provided in Section 1, Rule 43 of the Rules of
Court). Added to the formidable list are the decisions and resolutions
of the National Labor Relations Commission (NLRC) which are now
initially reviewable by this court, instead of a direct recourse to the
Supreme Court, via petition for certiorari under Rule 65 (St. Martin
Funeral Homes v. NLRC, 295 SCRA 414)".
- >-
Diatom Diatoms are a widespread group and can be found in the oceans, in
fresh water, in soils, and on damp surfaces. They are one of the
dominant components of phytoplankton in nutrient-rich coastal waters and
during oceanic spring blooms, since they can divide more rapidly than
other groups of phytoplankton.[66] Most live pelagically in open water,
although some live as surface films at the water-sediment interface
(benthic), or even under damp atmospheric conditions. They are
especially important in oceans, where they contribute an estimated 45%
of the total oceanic primary production of organic material.[67] Spatial
distribution of marine phytoplankton species is restricted both
horizontally and vertically.[68][20]
- >-
Bear in the Big Blue House Bear in the Big Blue House is an American
children's television series created by Mitchell Kriegman and produced
by Jim Henson Television for Disney Channel's Playhouse Disney preschool
television block. Debuting on October 20, 1997,[1][2] it aired its last
episode on April 28, 2006.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on BAAI/bge-base-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-base-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("arvindcreatrix/bge-baes-my-qna-model")
# Run inference
sentences = [
'Represent this sentence for searching relevant passages: when did bear inthe big blue house come out',
"Bear in the Big Blue House Bear in the Big Blue House is an American children's television series created by Mitchell Kriegman and produced by Jim Henson Television for Disney Channel's Playhouse Disney preschool television block. Debuting on October 20, 1997,[1][2] it aired its last episode on April 28, 2006.",
'Court of Appeals of the Philippines The Court of Appeals of the Philippines (Filipino: Hukuman ng Apelasyon ng Pilipinas) is the Philippines\' second-highest judicial court, just after the Supreme Court. The court consists of 69 Associate Justices and 1 Presiding Justice. Under the Constitution, the Court of Appeals (CA) "reviews not only the decisions and orders of the Regional Trial Courts nationwide but also those of the Court of Tax Appeals, as well as the awards, judgments, final orders or resolutions of, or authorized by 21 Quasi-Judicial Agencies exercising quasi-judicial functions mentioned in Rule 43 of the 1997 Rules of Civil Procedure, plus the National Amnesty Commission (Pres. Proclamation No. 347 of 1994) and Office of the Ombudsman (Fabian v. Desierto, 295 SCRA 470). Under RA 9282, which elevated the CTA to the same level of the CA, CTA en banc decisions are now subject to review by the Supreme Court instead of the CA (as opposed to what is currently provided in Section 1, Rule 43 of the Rules of Court). Added to the formidable list are the decisions and resolutions of the National Labor Relations Commission (NLRC) which are now initially reviewable by this court, instead of a direct recourse to the Supreme Court, via petition for certiorari under Rule 65 (St. Martin Funeral Homes v. NLRC, 295 SCRA 414)".',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 100,231 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 17 tokens
- mean: 19.79 tokens
- max: 31 tokens
- min: 18 tokens
- mean: 137.51 tokens
- max: 512 tokens
- Samples:
sentence_0 sentence_1 Represent this sentence for searching relevant passages: in baseball statistics what does opie s stand for
On-base plus slugging On-base plus slugging (OPS) is a sabermetric baseball statistic calculated as the sum of a player's on-base percentage and slugging average.[1] The ability of a player both to get on base and to hit for power, two important offensive skills, are represented. An OPS of .900 or higher in Major League Baseball puts the player in the upper echelon of hitters. Typically, the league leader in OPS will score near, and sometimes above, the 1.000 mark.
Represent this sentence for searching relevant passages: what was the first year of the nissan gtr
Nissan GT-R The Nissan GT-R is a 2-door 2+2 high performance vehicle produced by Nissan, unveiled in 2007.[2][3][4] It is the successor to the Nissan Skyline GT-R, although no longer part of the Skyline range itself, the name having been given over to the R35 Series and having since left its racing roots.
Represent this sentence for searching relevant passages: how long do former presidents receive secret service protection
Former Presidents Act Former presidents were entitled from 1965 to 1996 to lifetime Secret Service protection, for themselves, spouses, and children under 16. A 1994 statute, (Pub.L. 103–329), limited post-presidential protection to ten years for presidents inaugurated after January 1, 1997.[7] Under this statute, Bill Clinton would still be entitled to lifetime protection, and all subsequent presidents would have been entitled to ten years' protection.[8] On January 10, 2013, President Barack Obama signed the Former Presidents Protection Act of 2012, reinstating lifetime Secret Service protection for his predecessor George W. Bush, himself, and all subsequent presidents.[9]
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 32per_device_eval_batch_size
: 32num_train_epochs
: 1multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 32per_device_eval_batch_size
: 32per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsehub_revision
: Nonegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseliger_kernel_config
: Noneeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.1596 | 500 | 0.0357 |
0.3192 | 1000 | 0.0175 |
0.4788 | 1500 | 0.0144 |
0.6384 | 2000 | 0.0142 |
0.7980 | 2500 | 0.0144 |
0.9575 | 3000 | 0.0133 |
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 4.1.0
- Transformers: 4.54.0
- PyTorch: 2.6.0+cu124
- Accelerate: 1.9.0
- Datasets: 4.0.0
- Tokenizers: 0.21.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}