lizchu414's picture
Add new SentenceTransformer model
06684e7 verified
metadata
base_model: sentence-transformers/all-mpnet-base-v2
datasets:
  - sentence-transformers/squad
language:
  - en
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:87599
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: What prompted transportation improvements in Portugal in the 1970's?
    sentences:
      - >-
        Greenhouses convert solar light to heat, enabling year-round production
        and the growth (in enclosed environments) of specialty crops and other
        plants not naturally suited to the local climate. Primitive greenhouses
        were first used during Roman times to produce cucumbers year-round for
        the Roman emperor Tiberius. The first modern greenhouses were built in
        Europe in the 16th century to keep exotic plants brought back from
        explorations abroad. Greenhouses remain an important part of
        horticulture today, and plastic transparent materials have also been
        used to similar effect in polytunnels and row covers.
      - >-
        By the early 1970s Portugal's fast economic growth with increasing
        consumption and purchase of new automobiles set the priority for
        improvements in transportation. Again in the 1990s, after joining the
        European Economic Community, the country built many new motorways.
        Today, the country has a 68,732 km (42,708 mi) road network, of which
        almost 3,000 km (1,864 mi) are part of system of 44 motorways. Opened in
        1944, the first motorway (which linked Lisbon to the National Stadium)
        was an innovative project that made Portugal among one of the first
        countries in the world to establish a motorway (this roadway eventually
        became the Lisbon-Cascais highway, or A5). But, although a few other
        tracts were created (around 1960 and 1970), it was only after the
        beginning of the 1980s that large-scale motorway construction was
        implemented. In 1972, Brisa, the highway concessionaire, was founded to
        handle the management of many of the regions motorways. On many
        highways, toll needs to be paid, see Via Verde. Vasco da Gama bridge is
        the longest bridge in Europe.
      - >-
        Kanye West began his early production career in the mid-1990s, making
        beats primarily for burgeoning local artists, eventually developing a
        style that involved speeding up vocal samples from classic soul records.
        His first official production credits came at the age of nineteen when
        he produced eight tracks on Down to Earth, the 1996 debut album of a
        Chicago rapper named Grav. For a time, West acted as a ghost producer
        for Deric "D-Dot" Angelettie. Because of his association with D-Dot,
        West wasn't able to release a solo album, so he formed and became a
        member and producer of the Go-Getters, a late-1990s Chicago rap group
        composed of him, GLC, Timmy G, Really Doe, and Arrowstar. His group was
        managed by John "Monopoly" Johnson, Don Crowley, and Happy Lewis under
        the management firm Hustle Period. After attending a series of
        promotional photo shoots and making some radio appearances, The
        Go-Getters released their first and only studio album World Record
        Holders in 1999. The album featured other Chicago-based rappers such as
        Rhymefest, Mikkey Halsted, Miss Criss, and Shayla G. Meanwhile, the
        production was handled by West, Arrowstar, Boogz, and Brian "All Day"
        Miller.
  - source_sentence: What did Virchow feel Darwin's conclusions lacked?
    sentences:
      - >-
        Similar organizations in other countries followed: The American
        Anthropological Association in 1902, the Anthropological Society of
        Madrid (1865), the Anthropological Society of Vienna (1870), the Italian
        Society of Anthropology and Ethnology (1871), and many others
        subsequently. The majority of these were evolutionist. One notable
        exception was the Berlin Society of Anthropology (1869) founded by
        Rudolph Virchow, known for his vituperative attacks on the
        evolutionists. Not religious himself, he insisted that Darwin's
        conclusions lacked empirical foundation.
      - >-
        Russian Imperialism led to the Russian Empire's conquest of Central Asia
        during the late 19th century's Imperial Era. Between 1864 and 1885
        Russia gradually took control of the entire territory of Russian
        Turkestan, the Tajikistan portion of which had been controlled by the
        Emirate of Bukhara and Khanate of Kokand. Russia was interested in
        gaining access to a supply of cotton and in the 1870s attempted to
        switch cultivation in the region from grain to cotton (a strategy later
        copied and expanded by the Soviets).[citation needed] By 1885
        Tajikistan's territory was either ruled by the Russian Empire or its
        vassal state, the Emirate of Bukhara, nevertheless Tajiks felt little
        Russian influence.[citation needed]
      - >-
        A solar balloon is a black balloon that is filled with ordinary air. As
        sunlight shines on the balloon, the air inside is heated and expands
        causing an upward buoyancy force, much like an artificially heated hot
        air balloon. Some solar balloons are large enough for human flight, but
        usage is generally limited to the toy market as the surface-area to
        payload-weight ratio is relatively high.
  - source_sentence: What is the object of study for linguistic anthropology?
    sentences:
      - >-
        Anthropology of development tends to view development from a critical
        perspective. The kind of issues addressed and implications for the
        approach simply involve pondering why, if a key development goal is to
        alleviate poverty, is poverty increasing? Why is there such a gap
        between plans and outcomes? Why are those working in development so
        willing to disregard history and the lessons it might offer? Why is
        development so externally driven rather than having an internal basis?
        In short why does so much planned development fail?
      - >-
        The study of kinship and social organization is a central focus of
        sociocultural anthropology, as kinship is a human universal.
        Sociocultural anthropology also covers economic and political
        organization, law and conflict resolution, patterns of consumption and
        exchange, material culture, technology, infrastructure, gender
        relations, ethnicity, childrearing and socialization, religion, myth,
        symbols, values, etiquette, worldview, sports, music, nutrition,
        recreation, games, food, festivals, and language (which is also the
        object of study in linguistic anthropology).
      - >-
        On 1 February 1908, the king Dom Carlos I of Portugal and his heir
        apparent, Prince Royal Dom Luís Filipe, Duke of Braganza, were murdered
        in Lisbon. Under his rule, Portugal had twice been declared bankrupt –
        on 14 June 1892, and again on 10 May 1902 – causing social turmoil,
        economic disturbances, protests, revolts and criticism of the monarchy.
        Manuel II of Portugal became the new king, but was eventually overthrown
        by the 5 October 1910 revolution, which abolished the regime and
        instated republicanism in Portugal. Political instability and economic
        weaknesses were fertile ground for chaos and unrest during the
        Portuguese First Republic. These conditions would lead to the failed
        Monarchy of the North, 28 May 1926 coup d'état, and the creation of the
        National Dictatorship (Ditadura Nacional).
  - source_sentence: What is the official name of Portugal?
    sentences:
      - >-
        Portugal (Portuguese: [puɾtuˈɣaɫ]), officially the Portuguese Republic
        (Portuguese: República Portuguesa), is a country on the Iberian
        Peninsula, in Southwestern Europe. It is the westernmost country of
        mainland Europe, being bordered by the Atlantic Ocean to the west and
        south and by Spain to the north and east. The Portugal–Spain border is
        1,214 km (754 mi) long and considered the longest uninterrupted border
        within the European Union. The republic also includes the Atlantic
        archipelagos of the Azores and Madeira, both autonomous regions with
        their own regional governments.
      - >-
        The large magnitude of solar energy available makes it a highly
        appealing source of electricity. The United Nations Development
        Programme in its 2000 World Energy Assessment found that the annual
        potential of solar energy was 1,575–49,837 exajoules (EJ). This is
        several times larger than the total world energy consumption, which was
        559.8 EJ in 2012.
      - >-
        It was temporarily under the control of the Tibetan empire and Chinese
        from 650–680 and then under the control of the Umayyads in 710. The
        Samanid Empire, 819 to 999, restored Persian control of the region and
        enlarged the cities of Samarkand and Bukhara (both cities are today part
        of Uzbekistan) which became the cultural centers of Iran and the region
        was known as Khorasan. The Kara-Khanid Khanate conquered Transoxania
        (which corresponds approximately with modern-day Uzbekistan, Tajikistan,
        southern Kyrgyzstan and southwest Kazakhstan) and ruled between
        999–1211. Their arrival in Transoxania signaled a definitive shift from
        Iranian to Turkic predominance in Central Asia, but gradually the
        Kara-khanids became assimilated into the Perso-Arab Muslim culture of
        the region.
  - source_sentence: >-
      During what years did the formation of the First Portuguese Republic take
      place?
    sentences:
      - >-
        Anthrozoology (also known as "human–animal studies") is the study of
        interaction between living things. It is a burgeoning interdisciplinary
        field that overlaps with a number of other disciplines, including
        anthropology, ethology, medicine, psychology, veterinary medicine and
        zoology. A major focus of anthrozoologic research is the quantifying of
        the positive effects of human-animal relationships on either party and
        the study of their interactions. It includes scholars from a diverse
        range of fields, including anthropology, sociology, biology, and
        philosophy.[n 7]
      - >-
        Professional anthropological bodies often object to the use of
        anthropology for the benefit of the state. Their codes of ethics or
        statements may proscribe anthropologists from giving secret briefings.
        The Association of Social Anthropologists of the UK and Commonwealth
        (ASA) has called certain scholarship ethically dangerous. The AAA's
        current 'Statement of Professional Responsibility' clearly states that
        "in relation with their own government and with host governments ... no
        secret research, no secret reports or debriefings of any kind should be
        agreed to or given."
      - >-
        Many Portuguese holidays, festivals and traditions have a Christian
        origin or connotation. Although relations between the Portuguese state
        and the Roman Catholic Church were generally amiable and stable since
        the earliest years of the Portuguese nation, their relative power
        fluctuated. In the 13th and 14th centuries, the church enjoyed both
        riches and power stemming from its role in the reconquest, its close
        identification with early Portuguese nationalism and the foundation of
        the Portuguese educational system, including the first university. The
        growth of the Portuguese overseas empire made its missionaries important
        agents of colonization, with important roles in the education and
        evangelization of people from all the inhabited continents. The growth
        of liberal and nascent republican movements during the eras leading to
        the formation of the First Portuguese Republic (1910–26) changed the
        role and importance of organized religion.

SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the squad dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("lizchu414/mpnet-base-all-nli-squad")
# Run inference
sentences = [
    'During what years did the formation of the First Portuguese Republic take place?',
    'Many Portuguese holidays, festivals and traditions have a Christian origin or connotation. Although relations between the Portuguese state and the Roman Catholic Church were generally amiable and stable since the earliest years of the Portuguese nation, their relative power fluctuated. In the 13th and 14th centuries, the church enjoyed both riches and power stemming from its role in the reconquest, its close identification with early Portuguese nationalism and the foundation of the Portuguese educational system, including the first university. The growth of the Portuguese overseas empire made its missionaries important agents of colonization, with important roles in the education and evangelization of people from all the inhabited continents. The growth of liberal and nascent republican movements during the eras leading to the formation of the First Portuguese Republic (1910–26) changed the role and importance of organized religion.',
    'Professional anthropological bodies often object to the use of anthropology for the benefit of the state. Their codes of ethics or statements may proscribe anthropologists from giving secret briefings. The Association of Social Anthropologists of the UK and Commonwealth (ASA) has called certain scholarship ethically dangerous. The AAA\'s current \'Statement of Professional Responsibility\' clearly states that "in relation with their own government and with host governments ... no secret research, no secret reports or debriefings of any kind should be agreed to or given."',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

squad

  • Dataset: squad at d84c8c2
  • Size: 87,599 training samples
  • Columns: question and answer
  • Approximate statistics based on the first 1000 samples:
    question answer
    type string string
    details
    • min: 6 tokens
    • mean: 14.46 tokens
    • max: 31 tokens
    • min: 34 tokens
    • mean: 187.2 tokens
    • max: 384 tokens
  • Samples:
    question answer
    To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France? Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.
    What is in front of the Notre Dame Main Building? Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.
    The Basilica of the Sacred heart at Notre Dame is beside to which structure? Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

squad

  • Dataset: squad at d84c8c2
  • Size: 87,599 evaluation samples
  • Columns: question and answer
  • Approximate statistics based on the first 1000 samples:
    question answer
    type string string
    details
    • min: 7 tokens
    • mean: 13.84 tokens
    • max: 31 tokens
    • min: 28 tokens
    • mean: 151.09 tokens
    • max: 368 tokens
  • Samples:
    question answer
    What is one purpose of a greenhouse? Greenhouses convert solar light to heat, enabling year-round production and the growth (in enclosed environments) of specialty crops and other plants not naturally suited to the local climate. Primitive greenhouses were first used during Roman times to produce cucumbers year-round for the Roman emperor Tiberius. The first modern greenhouses were built in Europe in the 16th century to keep exotic plants brought back from explorations abroad. Greenhouses remain an important part of horticulture today, and plastic transparent materials have also been used to similar effect in polytunnels and row covers.
    What was one of the first uses of a greenhouse? Greenhouses convert solar light to heat, enabling year-round production and the growth (in enclosed environments) of specialty crops and other plants not naturally suited to the local climate. Primitive greenhouses were first used during Roman times to produce cucumbers year-round for the Roman emperor Tiberius. The first modern greenhouses were built in Europe in the 16th century to keep exotic plants brought back from explorations abroad. Greenhouses remain an important part of horticulture today, and plastic transparent materials have also been used to similar effect in polytunnels and row covers.
    Where were the first modern greenhouses built? Greenhouses convert solar light to heat, enabling year-round production and the growth (in enclosed environments) of specialty crops and other plants not naturally suited to the local climate. Primitive greenhouses were first used during Roman times to produce cucumbers year-round for the Roman emperor Tiberius. The first modern greenhouses were built in Europe in the 16th century to keep exotic plants brought back from explorations abroad. Greenhouses remain an important part of horticulture today, and plastic transparent materials have also been used to similar effect in polytunnels and row covers.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Framework Versions

  • Python: 3.12.7
  • Sentence Transformers: 3.2.0
  • Transformers: 4.45.2
  • PyTorch: 2.2.2+cu121
  • Accelerate: 1.0.1
  • Datasets: 3.0.1
  • Tokenizers: 0.20.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}