JJTsao commited on
Commit
0379731
·
verified ·
1 Parent(s): 0f86a19

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -452
README.md CHANGED
@@ -1,480 +1,89 @@
1
  ---
 
2
  tags:
 
 
3
  - sentence-transformers
4
- - sentence-similarity
5
- - feature-extraction
6
- - generated_from_trainer
7
- - dataset_size:32394
8
- - loss:MultipleNegativesRankingLoss
9
- base_model: sentence-transformers/all-MiniLM-L6-v2
10
- widget:
11
- - source_sentence: Feel-good Mexican telenovelas set in the 1980s with a focus on
12
- elementary school kids and their relationships.
13
- sentences:
14
- - "Title: America: The Story of Us\nGenres: Documentary\nOverview: From wagon trains\
15
- \ crossing the untamed frontier to man's first steps on the moon, this series\
16
- \ offers a compelling look at the people, inventions and events that helped forge\
17
- \ the United States of America.\nTagline: \nCreator: \nStars: Liev Schreiber,\
18
- \ Tom Brokaw, Annette Gordon-Reed\nRelease Date: 2010-04-25\nKeywords: history"
19
- - "Title: Carrusel\nGenres: Comedy, Drama, Family, Kids, Soap\nOverview: Carrusel\
20
- \ is a Mexican telenovela, produced by and first broadcast on Televisa in 1989.\
21
- \ It covers daily life in a Mexican elementary school and the children's relationships\
22
- \ with a charismatic teacher named Jimena. Among other plot devices, it deals\
23
- \ with the differences between the upper and lower classes of Mexican society\
24
- \ — specifically as seen in a romantic relationship between Cirilo, a poor black\
25
- \ boy, and a spoiled rich girl, Maria Joaquina Villaseñor.\nTagline: \nCreator:\
26
- \ Abel Santacruz\nStars: Gabriela Rivero, Pedro Javier Viveros, Ludwika Paleta\n\
27
- Release Date: 1989-01-19\nKeywords: mexico city, mexico, elementary school, school,\
28
- \ family, naive children, 1980s, school kids"
29
- - 'Title: Dracula
30
-
31
- Genres: Drama
32
-
33
- Overview: It''s the late 19th century, and the mysterious Dracula has arrived
34
- in London, posing as an American entrepreneur who wants to bring modern science
35
- to Victorian society. He''s especially interested in the new technology of electricity,
36
- which promises to brighten the night - useful for someone who avoids the sun.
37
- But he has another reason for his travels: he hopes to take revenge on those who
38
- cursed him with immortality centuries earlier. Everything seems to be going according
39
- to plan... until he becomes infatuated with a woman who appears to be a reincarnation
40
- of his dead wife.
41
-
42
- Tagline: The legend takes new life.
43
-
44
- Creator: Daniel Knauf, Cole Haddon
45
-
46
- Stars: Jonathan Rhys Meyers, Jessica De Gouw, Katie McGrath
47
-
48
- Release Date: 2013-10-25
49
-
50
- Keywords: london, england, vampire, victorian england, 19th century, dracula'
51
- - source_sentence: Know any good TV programs with both Lee Dong-wook and Yoo In-na?
52
- sentences:
53
- - "Title: Touch Your Heart\nGenres: Comedy, Drama\nOverview: Hoping to make a comeback\
54
- \ after a bad scandal, an actress agrees to research a new role by taking a job\
55
- \ as a secretary for a prickly attorney.\nTagline: \nCreator: Park Joon-hwa\n\
56
- Stars: Lee Dong-wook, Yoo In-na, Lee Sang-woo\nRelease Date: 2019-02-06\nKeywords:\
57
- \ based on novel or book, assistant, romance, lawyer, law firm, opposites attract,\
58
- \ entertainment industry, famous actress"
59
- - "Title: Creeped Out\nGenres: Sci-Fi & Fantasy, Mystery\nOverview: A masked figure\
60
- \ known as \"The Curious\" collects tales of dark magic, otherworldly encounters\
61
- \ and twisted technology in this kids anthology series.\nTagline: \nCreator: Robert\
62
- \ Butler, Bede Blake\nStars: Aurora Aksnes, William Romain, Jaiden Cannatelli\n\
63
- Release Date: 2017-10-31\nKeywords: anthology, horror anthology, horror"
64
- - "Title: Love a Lifetime\nGenres: Drama, Sci-Fi & Fantasy, Action & Adventure\n\
65
- Overview: Amidst a legacy of family feuds, a kind-hearted young woman, Rong Hua,\
66
- \ crosses paths with the mysterious Nalan Yue while searching for a powerful healing\
67
- \ artifact. As they fall in love, they uncover a deep history of revenge linking\
68
- \ their families. With a new threat rising and Nalan Yue battling a dark power\
69
- \ within, the two must fight to overcome the past and protect their future together.\n\
70
- Tagline: \nCreator: \nStars: Ren Jialun, Zhang Huiwen, Li Yitong\nRelease Date:\
71
- \ 2020-06-18\nKeywords: love at first sight, romance, hatred, wuxia, successor,\
72
- \ web series, secondary couple"
73
- - source_sentence: Memorable drama TV programs focused on life and grappling with
74
- relationships
75
- sentences:
76
- - "Title: El Maleficio\nGenres: Drama\nOverview: \nTagline: \nCreator: Fernanda\
77
- \ Villeli\nStars: Fernando Colunga, Marlene Favela, Sofía Castro\nRelease Date:\
78
- \ 2023-11-13\nKeywords: "
79
- - "Title: You Can Do Better\nGenres: Comedy\nOverview: A half-hour brain candy show\
80
- \ that tackles major topics like drinking, technology, sex, money, and friends.\
81
- \ Through a mix of sketch, how-to, man-on-the-street and expert interviews, our\
82
- \ hosts impart tips and tricks that every adult should know. Viewers will learn\
83
- \ to be better at the subjects no one teaches in school, and they'll get to belly-laugh\
84
- \ along the way.\nTagline: \nCreator: \nStars: Abbi Crutchfield, Matthew Latkiewicz,\
85
- \ Jessy Greer\nRelease Date: 2016-08-23\nKeywords: "
86
- - "Title: Junjou Romantica\nGenres: Animation, Comedy, Drama\nOverview: Three couples,\
87
- \ three intense romances: a student’s tutor crosses the line, a loner meets a\
88
- \ force of nature, and a carefree man faces love he can’t ignore.\nTagline: \n\
89
- Creator: Shungiku Nakamura\nStars: Hikaru Hanada, Takahiro Sakurai, Nobutoshi\
90
- \ Canna\nRelease Date: 2008-04-10\nKeywords: college, romance, slice of life,\
91
- \ coming of age, based on manga, art, teacher student relationship, lgbt, angst,\
92
- \ anime, drastic change of life, erotic, gay theme, tsundere, boys' love (bl)"
93
- - source_sentence: Compelling dramas exploring the repercussions of past actions
94
- sentences:
95
- - 'Title: Stay Close
96
-
97
- Genres: Drama, Crime, Mystery
98
-
99
- Overview: When Carlton Flynn vanishes 17 years to the night after Stewart Green
100
- did, it sets off a chain reaction in the lives of people connected to both men.
101
-
102
- Tagline: Everyone has secrets.
103
-
104
- Creator: Harlan Coben
105
-
106
- Stars: Cush Jumbo, James Nesbitt, Richard Armitage
107
-
108
- Release Date: 2021-12-31
109
-
110
- Keywords: suicide, detective, celebrity, reporter, husband, dark'
111
- - "Title: Los misterios de Laura\nGenres: Crime, Drama, Mystery\nOverview: \nTagline:\
112
- \ \nCreator: Javier Holgado, Carlos Vila\nStars: María Pujalte, Fernando Guillén\
113
- \ Cuervo, César Camino\nRelease Date: 2009-07-27\nKeywords: investigation, investigator,\
114
- \ crime investigation"
115
- - "Title: Hitori no Shita: The Outcast\nGenres: Animation, Sci-Fi & Fantasy, Action\
116
- \ & Adventure, Comedy\nOverview: Zhang Chulan leads a very common college student's\
117
- \ life until he finds himself caught up in a terrible incident that happened in\
118
- \ a small village. As he was walking through a graveyard, he is assaulted by zombies.\
119
- \ Thinking that it was over for him, a mysterious girl carrying a sword suddenly\
120
- \ saves him and disappears.\nTagline: \nCreator: Dong Man Tang, Mi Er\nStars:\
121
- \ Xiao Liansha, Sheng Feng, Yuntu Cao\nRelease Date: 2016-07-09\nKeywords: fighting,\
122
- \ advanture, city, based on manhua, fantasy, urban fantasy, sino japanese production,\
123
- \ passionate, donghua, comedy, coproduction, urban adventure, qihuan, dongfang"
124
- - source_sentence: Memorable drama TV series focused on slight romance and grappling
125
- with investigation
126
- sentences:
127
- - "Title: Reset\nGenres: Drama, Mystery\nOverview: The lives of a college student\
128
- \ and a video game designer are kept being reset after an explosion on a bus.\
129
- \ During each reset, they have to work together to find out what the reason for\
130
- \ the explosion is. Will these two be able to save themselves and their fellow\
131
- \ passengers? Will they be able to close the time-loop?\nTagline: \nCreator: \n\
132
- Stars: Bai Jingting, Zhao Jinmai, Liu Tao\nRelease Date: 2022-01-11\nKeywords:\
133
- \ time travel, investigation, time loop, explosion, slight romance, student, suspense"
134
- - "Title: The Boss\nGenres: Comedy, Drama\nOverview: Eliseo is the superintendent\
135
- \ of an upscale building. On the surface, is cordial and docile in his role, but\
136
- \ underneath Eliseo believes himself the omnipotent figure of the community —\
137
- \ meddling in the affairs of residents and pulling strings as he sees fit. Eliseo's\
138
- \ only concern is protecting his job, which comes under threat by a proposed pool\
139
- \ project.\nTagline: \nCreator: Mariano Cohn, Gastón Duprat\nStars: Gastón Cocchiarale,\
140
- \ Guillermo Francella, Gabriel Goity\nRelease Date: 2022-10-26\nKeywords: manipulation,\
141
- \ buenos aires, argentina, apartment building, scheming, serie argentina, building\
142
- \ superintendent"
143
- - "Title: Wildlife Specials\nGenres: Documentary\nOverview: The BBC Wildlife Specials\
144
- \ are a series of nature documentary programmes commissioned by BBC Television.\
145
- \ The Wildlife Specials began with a pilot episode in 1995. 20 programmes have\
146
- \ been made to date, with three of the recent ones being in multi parts. The earlier\
147
- \ programmes were produced in-house by the BBC's specialist Natural History Unit,\
148
- \ but the more recent Spy in the... titles were made by the independent John Downer\
149
- \ Productions. The first 18 programmes, up to 2008, were narrated by David Attenborough.\
150
- \ The most recent two were narrated by David Tennant.\n\n\"The world's leading\
151
- \ natural history filmmakers meet the world's most charismatic animals\"\n\n—\
152
- \ BBC tagline\nTagline: \nCreator: \nStars: David Attenborough\nRelease Date:\
153
- \ 1995-04-14\nKeywords: animals, nature documentary, cats"
154
- pipeline_tag: sentence-similarity
155
  library_name: sentence-transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
156
  ---
157
 
158
- # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
159
 
160
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
161
 
162
- ## Model Details
163
 
164
- ### Model Description
165
- - **Model Type:** Sentence Transformer
166
- - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
167
- - **Maximum Sequence Length:** 256 tokens
168
- - **Output Dimensionality:** 384 dimensions
169
- - **Similarity Function:** Cosine Similarity
170
- <!-- - **Training Dataset:** Unknown -->
171
- <!-- - **Language:** Unknown -->
172
- <!-- - **License:** Unknown -->
173
 
174
- ### Model Sources
175
 
176
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
177
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
178
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
179
 
180
- ### Full Model Architecture
 
 
 
181
 
182
- ```
183
- SentenceTransformer(
184
- (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
185
- (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
186
- (2): Normalize()
187
- )
188
- ```
189
 
190
- ## Usage
191
 
192
- ### Direct Usage (Sentence Transformers)
 
 
 
 
 
 
193
 
194
- First install the Sentence Transformers library:
 
 
 
195
 
196
- ```bash
197
- pip install -U sentence-transformers
198
- ```
199
 
200
- Then you can load this model and run inference.
 
201
  ```python
202
  from sentence_transformers import SentenceTransformer
203
 
204
- # Download from the 🤗 Hub
205
- model = SentenceTransformer("sentence_transformers_model_id")
206
- # Run inference
207
- sentences = [
208
- 'Memorable drama TV series focused on slight romance and grappling with investigation',
209
- 'Title: Reset\nGenres: Drama, Mystery\nOverview: The lives of a college student and a video game designer are kept being reset after an explosion on a bus. During each reset, they have to work together to find out what the reason for the explosion is. Will these two be able to save themselves and their fellow passengers? Will they be able to close the time-loop?\nTagline: \nCreator: \nStars: Bai Jingting, Zhao Jinmai, Liu Tao\nRelease Date: 2022-01-11\nKeywords: time travel, investigation, time loop, explosion, slight romance, student, suspense',
210
- "Title: The Boss\nGenres: Comedy, Drama\nOverview: Eliseo is the superintendent of an upscale building. On the surface, is cordial and docile in his role, but underneath Eliseo believes himself the omnipotent figure of the community — meddling in the affairs of residents and pulling strings as he sees fit. Eliseo's only concern is protecting his job, which comes under threat by a proposed pool project.\nTagline: \nCreator: Mariano Cohn, Gastón Duprat\nStars: Gastón Cocchiarale, Guillermo Francella, Gabriel Goity\nRelease Date: 2022-10-26\nKeywords: manipulation, buenos aires, argentina, apartment building, scheming, serie argentina, building superintendent",
211
- ]
212
- embeddings = model.encode(sentences)
213
- print(embeddings.shape)
214
- # [3, 384]
215
-
216
- # Get the similarity scores for the embeddings
217
- similarities = model.similarity(embeddings, embeddings)
218
- print(similarities.shape)
219
- # [3, 3]
220
- ```
221
-
222
- <!--
223
- ### Direct Usage (Transformers)
224
-
225
- <details><summary>Click to see the direct usage in Transformers</summary>
226
-
227
- </details>
228
- -->
229
-
230
- <!--
231
- ### Downstream Usage (Sentence Transformers)
232
-
233
- You can finetune this model on your own dataset.
234
-
235
- <details><summary>Click to expand</summary>
236
-
237
- </details>
238
- -->
239
-
240
- <!--
241
- ### Out-of-Scope Use
242
-
243
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
244
- -->
245
-
246
- <!--
247
- ## Bias, Risks and Limitations
248
-
249
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
250
- -->
251
-
252
- <!--
253
- ### Recommendations
254
-
255
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
256
- -->
257
-
258
- ## Training Details
259
-
260
- ### Training Dataset
261
-
262
- #### Unnamed Dataset
263
-
264
- * Size: 32,394 training samples
265
- * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
266
- * Approximate statistics based on the first 1000 samples:
267
- | | sentence_0 | sentence_1 | sentence_2 |
268
- |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
269
- | type | string | string | string |
270
- | details | <ul><li>min: 8 tokens</li><li>mean: 17.07 tokens</li><li>max: 38 tokens</li></ul> | <ul><li>min: 38 tokens</li><li>mean: 133.54 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 40 tokens</li><li>mean: 132.86 tokens</li><li>max: 256 tokens</li></ul> |
271
- * Samples:
272
- | sentence_0 | sentence_1 | sentence_2 |
273
- |:-----------------------------------------------------------------------------------------------------------------|||
274
- | <code>Dramatic fantasy romance with a touch of destiny and betrayal</code> | <code>Title: Eternal Love<br>Genres: Drama, Sci-Fi & Fantasy<br>Overview: Three hundred years ago, Bai Qian stood on the Zhu Xian Terrace, turned around and jumped off without regret. Ye Hua stood by the bronze mirror to witness with his own eyes her death. Three hundred years later, in the East Sea Dragon Palace, the two meet unexpectedly. Another lifetime another world, after suffering betrayal Bai Qian no longer feels anything, yet she can't seem to comprehend Ye Hua's actions. Three lives three worlds, her and him, are they fated to love again?<br>Tagline: <br>Creator: <br>Stars: Yang Mi, Mark Chao, Ken Chang Tzu-Yao<br>Release Date: 2017-01-30<br>Keywords: china, arranged marriage, romance, fate, second chance, older woman younger man relationship, xianxia</code> | <code>Title: Kidsongs<br>Genres: Comedy<br>Overview: Kidsongs is an American children's media franchise which includes Kidsongs Music Video Stories on DVD and video, The Kidsongs TV Show, CDs of favorite children’s songs and covers of oldies and pop hits from the 50s, 60s and 70s, song books, sheet music, toys and an ecommerce website. Kidsongs was created by producer/writer Carol Rosenstein and director Bruce Gowers of Together Again Video Productions, both of whom are music video and television production veterans. The duo had produced and directed over 100 music videos for Warner Brothers Records and took their idea of music videos for children to the record label. Warner Brothers funded the first video, “A Day at Old MacDonald’s Farm”. Shortly thereafter, a three way partnership between TAVP, WBR and View-Master Video was formed with TAVP being responsible for production and WBR and View-Master responsible for distribution to video and music stores, and toy stores respectively.<br>Tagline: <br>Creat...</code> |
275
- | <code>Memorable animation TV shows focused on cartoon and grappling with superliga</code> | <code>Title: Supa Strikas<br>Genres: Animation<br>Overview: With dreams of becoming Super League champions, a talented striker named Shakes and his football team take on rivals while going on global adventures.<br>Tagline: <br>Creator: <br>Stars: Corny Rempel, Kevin Aichele, Chelsea Rankin<br>Release Date: 2009-02-15<br>Keywords: cartoon, football (soccer), superliga</code> | <code>Title: Grand Hotel<br>Genres: Drama, Crime, Mystery<br>Overview: Santiago Mendoza owns last family-owned hotel in multicultural Miami Beach, while his glamorous second wife, Gigi, and their adult children enjoy the spoils of success.<br>Tagline: Five star hotel. Five star secrets.<br>Creator: Brian Tanen<br>Stars: Demián Bichir, Roselyn Sánchez, Denyse Tontz<br>Release Date: 2019-06-17<br>Keywords: miami, florida, hotel, remake, family conflict, upstairs downstairs, wealthy family</code> |
276
- | <code>Any recommendations for top action & adventure TV programs from 2010 featuring Catherine Siachoque?</code> | <code>Title: Missing<br>Genres: Mystery, Action & Adventure, Crime<br>Overview: The night Elisa’s cousins-Santiago, Flor, and Eduardo, invited her to a nightclub and after a great deal of begging her parents allowed her go. When Danna and he sister-in-law Cecilia went to pick them up, all of them started showing up except for Elisa. As the hours passed, her parent grew more and more desperate and it was then when they decided to call the police and file a missing report.<br>Tagline: <br>Creator: <br>Stars: Sonya Smith, Catherine Siachoque, Jesus Licciardello<br>Release Date: 2010-03-08<br>Keywords: </code> | <code>Title: Aurora<br>Genres: Mystery, Soap, Drama, Crime<br>Overview: Having been cryogenically frozen for 20 years, Aurora's heart torn between past and present : memories of an old love and chance of a new one.<br>Tagline: <br>Creator: Marcela Citterio<br>Stars: Sara Maldonado, Eugenio Siller, Sonya Smith<br>Release Date: 2010-11-01<br>Keywords: </code> |
277
- * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
278
- ```json
279
- {
280
- "scale": 20.0,
281
- "similarity_fct": "cos_sim"
282
- }
283
- ```
284
-
285
- ### Training Hyperparameters
286
- #### Non-Default Hyperparameters
287
-
288
- - `per_device_train_batch_size`: 32
289
- - `per_device_eval_batch_size`: 32
290
- - `num_train_epochs`: 4
291
- - `multi_dataset_batch_sampler`: round_robin
292
-
293
- #### All Hyperparameters
294
- <details><summary>Click to expand</summary>
295
-
296
- - `overwrite_output_dir`: False
297
- - `do_predict`: False
298
- - `eval_strategy`: no
299
- - `prediction_loss_only`: True
300
- - `per_device_train_batch_size`: 32
301
- - `per_device_eval_batch_size`: 32
302
- - `per_gpu_train_batch_size`: None
303
- - `per_gpu_eval_batch_size`: None
304
- - `gradient_accumulation_steps`: 1
305
- - `eval_accumulation_steps`: None
306
- - `torch_empty_cache_steps`: None
307
- - `learning_rate`: 5e-05
308
- - `weight_decay`: 0.0
309
- - `adam_beta1`: 0.9
310
- - `adam_beta2`: 0.999
311
- - `adam_epsilon`: 1e-08
312
- - `max_grad_norm`: 1
313
- - `num_train_epochs`: 4
314
- - `max_steps`: -1
315
- - `lr_scheduler_type`: linear
316
- - `lr_scheduler_kwargs`: {}
317
- - `warmup_ratio`: 0.0
318
- - `warmup_steps`: 0
319
- - `log_level`: passive
320
- - `log_level_replica`: warning
321
- - `log_on_each_node`: True
322
- - `logging_nan_inf_filter`: True
323
- - `save_safetensors`: True
324
- - `save_on_each_node`: False
325
- - `save_only_model`: False
326
- - `restore_callback_states_from_checkpoint`: False
327
- - `no_cuda`: False
328
- - `use_cpu`: False
329
- - `use_mps_device`: False
330
- - `seed`: 42
331
- - `data_seed`: None
332
- - `jit_mode_eval`: False
333
- - `use_ipex`: False
334
- - `bf16`: False
335
- - `fp16`: False
336
- - `fp16_opt_level`: O1
337
- - `half_precision_backend`: auto
338
- - `bf16_full_eval`: False
339
- - `fp16_full_eval`: False
340
- - `tf32`: None
341
- - `local_rank`: 0
342
- - `ddp_backend`: None
343
- - `tpu_num_cores`: None
344
- - `tpu_metrics_debug`: False
345
- - `debug`: []
346
- - `dataloader_drop_last`: False
347
- - `dataloader_num_workers`: 0
348
- - `dataloader_prefetch_factor`: None
349
- - `past_index`: -1
350
- - `disable_tqdm`: False
351
- - `remove_unused_columns`: True
352
- - `label_names`: None
353
- - `load_best_model_at_end`: False
354
- - `ignore_data_skip`: False
355
- - `fsdp`: []
356
- - `fsdp_min_num_params`: 0
357
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
358
- - `tp_size`: 0
359
- - `fsdp_transformer_layer_cls_to_wrap`: None
360
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
361
- - `deepspeed`: None
362
- - `label_smoothing_factor`: 0.0
363
- - `optim`: adamw_torch
364
- - `optim_args`: None
365
- - `adafactor`: False
366
- - `group_by_length`: False
367
- - `length_column_name`: length
368
- - `ddp_find_unused_parameters`: None
369
- - `ddp_bucket_cap_mb`: None
370
- - `ddp_broadcast_buffers`: False
371
- - `dataloader_pin_memory`: True
372
- - `dataloader_persistent_workers`: False
373
- - `skip_memory_metrics`: True
374
- - `use_legacy_prediction_loop`: False
375
- - `push_to_hub`: False
376
- - `resume_from_checkpoint`: None
377
- - `hub_model_id`: None
378
- - `hub_strategy`: every_save
379
- - `hub_private_repo`: None
380
- - `hub_always_push`: False
381
- - `gradient_checkpointing`: False
382
- - `gradient_checkpointing_kwargs`: None
383
- - `include_inputs_for_metrics`: False
384
- - `include_for_metrics`: []
385
- - `eval_do_concat_batches`: True
386
- - `fp16_backend`: auto
387
- - `push_to_hub_model_id`: None
388
- - `push_to_hub_organization`: None
389
- - `mp_parameters`:
390
- - `auto_find_batch_size`: False
391
- - `full_determinism`: False
392
- - `torchdynamo`: None
393
- - `ray_scope`: last
394
- - `ddp_timeout`: 1800
395
- - `torch_compile`: False
396
- - `torch_compile_backend`: None
397
- - `torch_compile_mode`: None
398
- - `include_tokens_per_second`: False
399
- - `include_num_input_tokens_seen`: False
400
- - `neftune_noise_alpha`: None
401
- - `optim_target_modules`: None
402
- - `batch_eval_metrics`: False
403
- - `eval_on_start`: False
404
- - `use_liger_kernel`: False
405
- - `eval_use_gather_object`: False
406
- - `average_tokens_across_devices`: False
407
- - `prompts`: None
408
- - `batch_sampler`: batch_sampler
409
- - `multi_dataset_batch_sampler`: round_robin
410
-
411
- </details>
412
-
413
- ### Training Logs
414
- | Epoch | Step | Training Loss |
415
- |:------:|:----:|:-------------:|
416
- | 0.4936 | 500 | 0.864 |
417
- | 0.9872 | 1000 | 0.5835 |
418
- | 1.4808 | 1500 | 0.4604 |
419
- | 1.9743 | 2000 | 0.4476 |
420
- | 2.4679 | 2500 | 0.3866 |
421
- | 2.9615 | 3000 | 0.3688 |
422
- | 3.4551 | 3500 | 0.3353 |
423
- | 3.9487 | 4000 | 0.3385 |
424
-
425
-
426
- ### Framework Versions
427
- - Python: 3.11.12
428
- - Sentence Transformers: 3.4.1
429
- - Transformers: 4.51.3
430
- - PyTorch: 2.6.0+cu124
431
- - Accelerate: 1.6.0
432
- - Datasets: 3.5.1
433
- - Tokenizers: 0.21.1
434
-
435
- ## Citation
436
-
437
- ### BibTeX
438
-
439
- #### Sentence Transformers
440
- ```bibtex
441
- @inproceedings{reimers-2019-sentence-bert,
442
- title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
443
- author = "Reimers, Nils and Gurevych, Iryna",
444
- booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
445
- month = "11",
446
- year = "2019",
447
- publisher = "Association for Computational Linguistics",
448
- url = "https://arxiv.org/abs/1908.10084",
449
- }
450
- ```
451
-
452
- #### MultipleNegativesRankingLoss
453
- ```bibtex
454
- @misc{henderson2017efficient,
455
- title={Efficient Natural Language Response Suggestion for Smart Reply},
456
- author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
457
- year={2017},
458
- eprint={1705.00652},
459
- archivePrefix={arXiv},
460
- primaryClass={cs.CL}
461
- }
462
  ```
463
 
464
- <!--
465
- ## Glossary
466
 
467
- *Clearly define terms in order to be accessible across audiences.*
468
- -->
469
 
470
- <!--
471
- ## Model Card Authors
 
472
 
473
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
474
- -->
475
 
476
- <!--
477
- ## Model Card Contact
478
 
479
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
480
- -->
 
1
  ---
2
+ license: apache-2.0
3
  tags:
4
+ - retrieval
5
+ - tv-show-recommendation
6
  - sentence-transformers
7
+ - semantic-search
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  library_name: sentence-transformers
9
+ model-index:
10
+ - name: fine-tuned movie retriever
11
+ results:
12
+ - task:
13
+ type: retrieval
14
+ name: Information Retrieval
15
+ metrics:
16
+ - name: Recall@1
17
+ type: recall
18
+ value: 0.454
19
+ - name: Recall@3
20
+ type: recall
21
+ value: 0.676
22
+ - name: Recall@5
23
+ type: recall
24
+ value: 0.730
25
+ - name: Recall@10
26
+ type: recall
27
+ value: 0.797
28
+ metrics:
29
+ - recall
30
+ base_model:
31
+ - sentence-transformers/all-MiniLM-L6-v2
32
  ---
33
 
34
+ # 🎬 Fine-Tuned TV Show Retriever (Rich Semantic & Metadata Queries + Smart Negatives)
35
 
36
+ [![Model](https://img.shields.io/badge/HuggingFace-Model-blue?logo=huggingface)](https://huggingface.co/your-username/my-st-model)
37
 
38
+ This is a custom fine-tuned sentence-transformer model designed for movie and TV recommendation systems. Optimized for high-quality vector retrieval in a movie and TV show recommendation RAG pipeline. Fine-tuning was done using ~32K synthetic natural language queries across metadata and vibe-based prompts:
39
 
40
+ - Enriched vibe-style natural language queries (e.g., Emotionally powerful space exploration film with themes of love and sacrifice.)
41
+ - Metadata-based natural language queries (e.g., Any crime movies from the 1990s directed by Quentin Tarantino about heist?)
42
+ - Smarter negative sampling (genre contrast, theme mismatch, star-topic confusion)
43
+ - A dataset of over 32,000 triplets (query, positive doc, negative doc)
 
 
 
 
 
44
 
 
45
 
46
+ ## 🧠 Training Details
 
 
47
 
48
+ - Base model: `sentence-transformers/all-MiniLM-L6-v2`
49
+ - Loss function: `MultipleNegativesRankingLoss`
50
+ - Epochs: 4
51
+ - Optimized for: top-k semantic retrieval in RAG systems
52
 
 
 
 
 
 
 
 
53
 
54
+ ## 📈 Evaluation: Fine-tuned vs Base Model
55
 
56
+ | Metric | Fine-Tuned Model Score | Base Model Score |
57
+ |-------------|:----------------------:|:----------------:|
58
+ | Recall@1 | 0.454 | 0.133 |
59
+ | Recall@3 | 0.676 | 0.230 |
60
+ | Recall@5 | 0.730 | 0.279 |
61
+ | Recall@10 | 0.797 | 0.349 |
62
+ | MMR | 0.583 | 0.207 |
63
 
64
+ **Evaluation setup**:
65
+ - Dataset: 3,600 held-out metadata and vibe-style natural queries
66
+ - Method: Top-k ranking using cosine similarity between query and positive documents
67
+ - Goal: Assess top-k retrieval quality in recommendation-like settings
68
 
 
 
 
69
 
70
+ ## 📦 Usage
71
+
72
  ```python
73
  from sentence_transformers import SentenceTransformer
74
 
75
+ model = SentenceTransformer("jjtsao/fine-tuned_tv_show_retriever")
76
+ query_embedding = model.encode("mind-bending sci-fi thrillers from the 2000s about identity")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
  ```
78
 
 
 
79
 
80
+ ## 🔍 Ideal Use Cases
 
81
 
82
+ - RAG-style movie recommendation apps
83
+ - Semantic filtering of large movie catalogs
84
+ - Query-document reranking pipelines
85
 
 
 
86
 
87
+ ## 📜 License
 
88
 
89
+ Apache 2.0 open for personal and commercial use.