JJTsao's picture
Upload folder using huggingface_hub
0f86a19 verified
|
raw
history blame
31.1 kB
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:32394
  - loss:MultipleNegativesRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
  - source_sentence: >-
      Feel-good Mexican telenovelas set in the 1980s with a focus on elementary
      school kids and their relationships.
    sentences:
      - >-
        Title: America: The Story of Us

        Genres: Documentary

        Overview: From wagon trains crossing the untamed frontier to man's first
        steps on the moon, this series offers a compelling look at the people,
        inventions and events that helped forge the United States of America.

        Tagline: 

        Creator: 

        Stars: Liev Schreiber, Tom Brokaw, Annette Gordon-Reed

        Release Date: 2010-04-25

        Keywords: history
      - >-
        Title: Carrusel

        Genres: Comedy, Drama, Family, Kids, Soap

        Overview: Carrusel is a Mexican telenovela, produced by and first
        broadcast on Televisa in 1989. It covers daily life in a Mexican
        elementary school and the children's relationships with a charismatic
        teacher named Jimena. Among other plot devices, it deals with the
        differences between the upper and lower classes of Mexican society 
        specifically as seen in a romantic relationship between Cirilo, a poor
        black boy, and a spoiled rich girl, Maria Joaquina Villaseñor.

        Tagline: 

        Creator: Abel Santacruz

        Stars: Gabriela Rivero, Pedro Javier Viveros, Ludwika Paleta

        Release Date: 1989-01-19

        Keywords: mexico city, mexico, elementary school, school, family, naive
        children, 1980s, school kids
      - >-
        Title: Dracula

        Genres: Drama

        Overview: It's the late 19th century, and the mysterious Dracula has
        arrived in London, posing as an American entrepreneur who wants to bring
        modern science to Victorian society. He's especially interested in the
        new technology of electricity, which promises to brighten the night -
        useful for someone who avoids the sun. But he has another reason for his
        travels: he hopes to take revenge on those who cursed him with
        immortality centuries earlier. Everything seems to be going according to
        plan... until he becomes infatuated with a woman who appears to be a
        reincarnation of his dead wife.

        Tagline: The legend takes new life.

        Creator: Daniel Knauf, Cole Haddon

        Stars: Jonathan Rhys Meyers, Jessica De Gouw, Katie McGrath

        Release Date: 2013-10-25

        Keywords: london, england, vampire, victorian england, 19th century,
        dracula
  - source_sentence: Know any good TV programs with both Lee Dong-wook and Yoo In-na?
    sentences:
      - >-
        Title: Touch Your Heart

        Genres: Comedy, Drama

        Overview: Hoping to make a comeback after a bad scandal, an actress
        agrees to research a new role by taking a job as a secretary for a
        prickly attorney.

        Tagline: 

        Creator: Park Joon-hwa

        Stars: Lee Dong-wook, Yoo In-na, Lee Sang-woo

        Release Date: 2019-02-06

        Keywords: based on novel or book, assistant, romance, lawyer, law firm,
        opposites attract, entertainment industry, famous actress
      - >-
        Title: Creeped Out

        Genres: Sci-Fi & Fantasy, Mystery

        Overview: A masked figure known as "The Curious" collects tales of dark
        magic, otherworldly encounters and twisted technology in this kids
        anthology series.

        Tagline: 

        Creator: Robert Butler, Bede Blake

        Stars: Aurora Aksnes, William Romain, Jaiden Cannatelli

        Release Date: 2017-10-31

        Keywords: anthology, horror anthology, horror
      - >-
        Title: Love a Lifetime

        Genres: Drama, Sci-Fi & Fantasy, Action & Adventure

        Overview: Amidst a legacy of family feuds, a kind-hearted young woman,
        Rong Hua, crosses paths with the mysterious Nalan Yue while searching
        for a powerful healing artifact. As they fall in love, they uncover a
        deep history of revenge linking their families. With a new threat rising
        and Nalan Yue battling a dark power within, the two must fight to
        overcome the past and protect their future together.

        Tagline: 

        Creator: 

        Stars: Ren Jialun, Zhang Huiwen, Li Yitong

        Release Date: 2020-06-18

        Keywords: love at first sight, romance, hatred, wuxia, successor, web
        series, secondary couple
  - source_sentence: >-
      Memorable drama TV programs focused on life and grappling with
      relationships
    sentences:
      - |-
        Title: El Maleficio
        Genres: Drama
        Overview: 
        Tagline: 
        Creator: Fernanda Villeli
        Stars: Fernando Colunga, Marlene Favela, Sofía Castro
        Release Date: 2023-11-13
        Keywords: 
      - >-
        Title: You Can Do Better

        Genres: Comedy

        Overview: A half-hour brain candy show that tackles major topics like
        drinking, technology, sex, money, and friends. Through a mix of sketch,
        how-to, man-on-the-street and expert interviews, our hosts impart tips
        and tricks that every adult should know. Viewers will learn to be better
        at the subjects no one teaches in school, and they'll get to belly-laugh
        along the way.

        Tagline: 

        Creator: 

        Stars: Abbi Crutchfield, Matthew Latkiewicz, Jessy Greer

        Release Date: 2016-08-23

        Keywords: 
      - >-
        Title: Junjou Romantica

        Genres: Animation, Comedy, Drama

        Overview: Three couples, three intense romances: a student’s tutor
        crosses the line, a loner meets a force of nature, and a carefree man
        faces love he can’t ignore.

        Tagline: 

        Creator: Shungiku Nakamura

        Stars: Hikaru Hanada, Takahiro Sakurai, Nobutoshi Canna

        Release Date: 2008-04-10

        Keywords: college, romance, slice of life, coming of age, based on
        manga, art, teacher student relationship, lgbt, angst, anime, drastic
        change of life, erotic, gay theme, tsundere, boys' love (bl)
  - source_sentence: Compelling dramas exploring the repercussions of past actions
    sentences:
      - >-
        Title: Stay Close

        Genres: Drama, Crime, Mystery

        Overview: When Carlton Flynn vanishes 17 years to the night after
        Stewart Green did, it sets off a chain reaction in the lives of people
        connected to both men.

        Tagline: Everyone has secrets.

        Creator: Harlan Coben

        Stars: Cush Jumbo, James Nesbitt, Richard Armitage

        Release Date: 2021-12-31

        Keywords: suicide, detective, celebrity, reporter, husband, dark
      - |-
        Title: Los misterios de Laura
        Genres: Crime, Drama, Mystery
        Overview: 
        Tagline: 
        Creator: Javier Holgado, Carlos Vila
        Stars: María Pujalte, Fernando Guillén Cuervo, César Camino
        Release Date: 2009-07-27
        Keywords: investigation, investigator, crime investigation
      - >-
        Title: Hitori no Shita: The Outcast

        Genres: Animation, Sci-Fi & Fantasy, Action & Adventure, Comedy

        Overview: Zhang Chulan leads a very common college student's life until
        he finds himself caught up in a terrible incident that happened in a
        small village. As he was walking through a graveyard, he is assaulted by
        zombies. Thinking that it was over for him, a mysterious girl carrying a
        sword suddenly saves him and disappears.

        Tagline: 

        Creator: Dong Man Tang, Mi Er

        Stars: Xiao Liansha, Sheng Feng, Yuntu Cao

        Release Date: 2016-07-09

        Keywords: fighting, advanture, city, based on manhua, fantasy, urban
        fantasy, sino japanese production, passionate, donghua, comedy,
        coproduction, urban adventure, qihuan, dongfang
  - source_sentence: >-
      Memorable drama TV series focused on slight romance and grappling with
      investigation
    sentences:
      - >-
        Title: Reset

        Genres: Drama, Mystery

        Overview: The lives of a college student and a video game designer are
        kept being reset after an explosion on a bus. During each reset, they
        have to work together to find out what the reason for the explosion is.
        Will these two be able to save themselves and their fellow passengers?
        Will they be able to close the time-loop?

        Tagline: 

        Creator: 

        Stars: Bai Jingting, Zhao Jinmai, Liu Tao

        Release Date: 2022-01-11

        Keywords: time travel, investigation, time loop, explosion, slight
        romance, student, suspense
      - >-
        Title: The Boss

        Genres: Comedy, Drama

        Overview: Eliseo is the superintendent of an upscale building. On the
        surface, is cordial and docile in his role, but underneath Eliseo
        believes himself the omnipotent figure of the community  meddling in
        the affairs of residents and pulling strings as he sees fit. Eliseo's
        only concern is protecting his job, which comes under threat by a
        proposed pool project.

        Tagline: 

        Creator: Mariano Cohn, Gastón Duprat

        Stars: Gastón Cocchiarale, Guillermo Francella, Gabriel Goity

        Release Date: 2022-10-26

        Keywords: manipulation, buenos aires, argentina, apartment building,
        scheming, serie argentina, building superintendent
      - >-
        Title: Wildlife Specials

        Genres: Documentary

        Overview: The BBC Wildlife Specials are a series of nature documentary
        programmes commissioned by BBC Television. The Wildlife Specials began
        with a pilot episode in 1995. 20 programmes have been made to date, with
        three of the recent ones being in multi parts. The earlier programmes
        were produced in-house by the BBC's specialist Natural History Unit, but
        the more recent Spy in the... titles were made by the independent John
        Downer Productions. The first 18 programmes, up to 2008, were narrated
        by David Attenborough. The most recent two were narrated by David
        Tennant.


        "The world's leading natural history filmmakers meet the world's most
        charismatic animals"


         BBC tagline

        Tagline: 

        Creator: 

        Stars: David Attenborough

        Release Date: 1995-04-14

        Keywords: animals, nature documentary, cats
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Memorable drama TV series focused on slight romance and grappling with investigation',
    'Title: Reset\nGenres: Drama, Mystery\nOverview: The lives of a college student and a video game designer are kept being reset after an explosion on a bus. During each reset, they have to work together to find out what the reason for the explosion is. Will these two be able to save themselves and their fellow passengers? Will they be able to close the time-loop?\nTagline: \nCreator: \nStars: Bai Jingting, Zhao Jinmai, Liu Tao\nRelease Date: 2022-01-11\nKeywords: time travel, investigation, time loop, explosion, slight romance, student, suspense',
    "Title: The Boss\nGenres: Comedy, Drama\nOverview: Eliseo is the superintendent of an upscale building. On the surface, is cordial and docile in his role, but underneath Eliseo believes himself the omnipotent figure of the community — meddling in the affairs of residents and pulling strings as he sees fit. Eliseo's only concern is protecting his job, which comes under threat by a proposed pool project.\nTagline: \nCreator: Mariano Cohn, Gastón Duprat\nStars: Gastón Cocchiarale, Guillermo Francella, Gabriel Goity\nRelease Date: 2022-10-26\nKeywords: manipulation, buenos aires, argentina, apartment building, scheming, serie argentina, building superintendent",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 32,394 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 8 tokens
    • mean: 17.07 tokens
    • max: 38 tokens
    • min: 38 tokens
    • mean: 133.54 tokens
    • max: 256 tokens
    • min: 40 tokens
    • mean: 132.86 tokens
    • max: 256 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    Dramatic fantasy romance with a touch of destiny and betrayal Title: Eternal Love
    Genres: Drama, Sci-Fi & Fantasy
    Overview: Three hundred years ago, Bai Qian stood on the Zhu Xian Terrace, turned around and jumped off without regret. Ye Hua stood by the bronze mirror to witness with his own eyes her death. Three hundred years later, in the East Sea Dragon Palace, the two meet unexpectedly. Another lifetime another world, after suffering betrayal Bai Qian no longer feels anything, yet she can't seem to comprehend Ye Hua's actions. Three lives three worlds, her and him, are they fated to love again?
    Tagline:
    Creator:
    Stars: Yang Mi, Mark Chao, Ken Chang Tzu-Yao
    Release Date: 2017-01-30
    Keywords: china, arranged marriage, romance, fate, second chance, older woman younger man relationship, xianxia
    Title: Kidsongs
    Genres: Comedy
    Overview: Kidsongs is an American children's media franchise which includes Kidsongs Music Video Stories on DVD and video, The Kidsongs TV Show, CDs of favorite children’s songs and covers of oldies and pop hits from the 50s, 60s and 70s, song books, sheet music, toys and an ecommerce website. Kidsongs was created by producer/writer Carol Rosenstein and director Bruce Gowers of Together Again Video Productions, both of whom are music video and television production veterans. The duo had produced and directed over 100 music videos for Warner Brothers Records and took their idea of music videos for children to the record label. Warner Brothers funded the first video, “A Day at Old MacDonald’s Farm”. Shortly thereafter, a three way partnership between TAVP, WBR and View-Master Video was formed with TAVP being responsible for production and WBR and View-Master responsible for distribution to video and music stores, and toy stores respectively.
    Tagline:
    Creat...
    Memorable animation TV shows focused on cartoon and grappling with superliga Title: Supa Strikas
    Genres: Animation
    Overview: With dreams of becoming Super League champions, a talented striker named Shakes and his football team take on rivals while going on global adventures.
    Tagline:
    Creator:
    Stars: Corny Rempel, Kevin Aichele, Chelsea Rankin
    Release Date: 2009-02-15
    Keywords: cartoon, football (soccer), superliga
    Title: Grand Hotel
    Genres: Drama, Crime, Mystery
    Overview: Santiago Mendoza owns last family-owned hotel in multicultural Miami Beach, while his glamorous second wife, Gigi, and their adult children enjoy the spoils of success.
    Tagline: Five star hotel. Five star secrets.
    Creator: Brian Tanen
    Stars: Demián Bichir, Roselyn Sánchez, Denyse Tontz
    Release Date: 2019-06-17
    Keywords: miami, florida, hotel, remake, family conflict, upstairs downstairs, wealthy family
    Any recommendations for top action & adventure TV programs from 2010 featuring Catherine Siachoque? Title: Missing
    Genres: Mystery, Action & Adventure, Crime
    Overview: The night Elisa’s cousins-Santiago, Flor, and Eduardo, invited her to a nightclub and after a great deal of begging her parents allowed her go. When Danna and he sister-in-law Cecilia went to pick them up, all of them started showing up except for Elisa. As the hours passed, her parent grew more and more desperate and it was then when they decided to call the police and file a missing report.
    Tagline:
    Creator:
    Stars: Sonya Smith, Catherine Siachoque, Jesus Licciardello
    Release Date: 2010-03-08
    Keywords:
    Title: Aurora
    Genres: Mystery, Soap, Drama, Crime
    Overview: Having been cryogenically frozen for 20 years, Aurora's heart torn between past and present : memories of an old love and chance of a new one.
    Tagline:
    Creator: Marcela Citterio
    Stars: Sara Maldonado, Eugenio Siller, Sonya Smith
    Release Date: 2010-11-01
    Keywords:
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 4
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.4936 500 0.864
0.9872 1000 0.5835
1.4808 1500 0.4604
1.9743 2000 0.4476
2.4679 2500 0.3866
2.9615 3000 0.3688
3.4551 3500 0.3353
3.9487 4000 0.3385

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 3.4.1
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.6.0
  • Datasets: 3.5.1
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}