Update README.md
Browse files
README.md
CHANGED
@@ -1,480 +1,89 @@
|
|
1 |
---
|
|
|
2 |
tags:
|
|
|
|
|
3 |
- sentence-transformers
|
4 |
-
-
|
5 |
-
- feature-extraction
|
6 |
-
- generated_from_trainer
|
7 |
-
- dataset_size:32394
|
8 |
-
- loss:MultipleNegativesRankingLoss
|
9 |
-
base_model: sentence-transformers/all-MiniLM-L6-v2
|
10 |
-
widget:
|
11 |
-
- source_sentence: Feel-good Mexican telenovelas set in the 1980s with a focus on
|
12 |
-
elementary school kids and their relationships.
|
13 |
-
sentences:
|
14 |
-
- "Title: America: The Story of Us\nGenres: Documentary\nOverview: From wagon trains\
|
15 |
-
\ crossing the untamed frontier to man's first steps on the moon, this series\
|
16 |
-
\ offers a compelling look at the people, inventions and events that helped forge\
|
17 |
-
\ the United States of America.\nTagline: \nCreator: \nStars: Liev Schreiber,\
|
18 |
-
\ Tom Brokaw, Annette Gordon-Reed\nRelease Date: 2010-04-25\nKeywords: history"
|
19 |
-
- "Title: Carrusel\nGenres: Comedy, Drama, Family, Kids, Soap\nOverview: Carrusel\
|
20 |
-
\ is a Mexican telenovela, produced by and first broadcast on Televisa in 1989.\
|
21 |
-
\ It covers daily life in a Mexican elementary school and the children's relationships\
|
22 |
-
\ with a charismatic teacher named Jimena. Among other plot devices, it deals\
|
23 |
-
\ with the differences between the upper and lower classes of Mexican society\
|
24 |
-
\ — specifically as seen in a romantic relationship between Cirilo, a poor black\
|
25 |
-
\ boy, and a spoiled rich girl, Maria Joaquina Villaseñor.\nTagline: \nCreator:\
|
26 |
-
\ Abel Santacruz\nStars: Gabriela Rivero, Pedro Javier Viveros, Ludwika Paleta\n\
|
27 |
-
Release Date: 1989-01-19\nKeywords: mexico city, mexico, elementary school, school,\
|
28 |
-
\ family, naive children, 1980s, school kids"
|
29 |
-
- 'Title: Dracula
|
30 |
-
|
31 |
-
Genres: Drama
|
32 |
-
|
33 |
-
Overview: It''s the late 19th century, and the mysterious Dracula has arrived
|
34 |
-
in London, posing as an American entrepreneur who wants to bring modern science
|
35 |
-
to Victorian society. He''s especially interested in the new technology of electricity,
|
36 |
-
which promises to brighten the night - useful for someone who avoids the sun.
|
37 |
-
But he has another reason for his travels: he hopes to take revenge on those who
|
38 |
-
cursed him with immortality centuries earlier. Everything seems to be going according
|
39 |
-
to plan... until he becomes infatuated with a woman who appears to be a reincarnation
|
40 |
-
of his dead wife.
|
41 |
-
|
42 |
-
Tagline: The legend takes new life.
|
43 |
-
|
44 |
-
Creator: Daniel Knauf, Cole Haddon
|
45 |
-
|
46 |
-
Stars: Jonathan Rhys Meyers, Jessica De Gouw, Katie McGrath
|
47 |
-
|
48 |
-
Release Date: 2013-10-25
|
49 |
-
|
50 |
-
Keywords: london, england, vampire, victorian england, 19th century, dracula'
|
51 |
-
- source_sentence: Know any good TV programs with both Lee Dong-wook and Yoo In-na?
|
52 |
-
sentences:
|
53 |
-
- "Title: Touch Your Heart\nGenres: Comedy, Drama\nOverview: Hoping to make a comeback\
|
54 |
-
\ after a bad scandal, an actress agrees to research a new role by taking a job\
|
55 |
-
\ as a secretary for a prickly attorney.\nTagline: \nCreator: Park Joon-hwa\n\
|
56 |
-
Stars: Lee Dong-wook, Yoo In-na, Lee Sang-woo\nRelease Date: 2019-02-06\nKeywords:\
|
57 |
-
\ based on novel or book, assistant, romance, lawyer, law firm, opposites attract,\
|
58 |
-
\ entertainment industry, famous actress"
|
59 |
-
- "Title: Creeped Out\nGenres: Sci-Fi & Fantasy, Mystery\nOverview: A masked figure\
|
60 |
-
\ known as \"The Curious\" collects tales of dark magic, otherworldly encounters\
|
61 |
-
\ and twisted technology in this kids anthology series.\nTagline: \nCreator: Robert\
|
62 |
-
\ Butler, Bede Blake\nStars: Aurora Aksnes, William Romain, Jaiden Cannatelli\n\
|
63 |
-
Release Date: 2017-10-31\nKeywords: anthology, horror anthology, horror"
|
64 |
-
- "Title: Love a Lifetime\nGenres: Drama, Sci-Fi & Fantasy, Action & Adventure\n\
|
65 |
-
Overview: Amidst a legacy of family feuds, a kind-hearted young woman, Rong Hua,\
|
66 |
-
\ crosses paths with the mysterious Nalan Yue while searching for a powerful healing\
|
67 |
-
\ artifact. As they fall in love, they uncover a deep history of revenge linking\
|
68 |
-
\ their families. With a new threat rising and Nalan Yue battling a dark power\
|
69 |
-
\ within, the two must fight to overcome the past and protect their future together.\n\
|
70 |
-
Tagline: \nCreator: \nStars: Ren Jialun, Zhang Huiwen, Li Yitong\nRelease Date:\
|
71 |
-
\ 2020-06-18\nKeywords: love at first sight, romance, hatred, wuxia, successor,\
|
72 |
-
\ web series, secondary couple"
|
73 |
-
- source_sentence: Memorable drama TV programs focused on life and grappling with
|
74 |
-
relationships
|
75 |
-
sentences:
|
76 |
-
- "Title: El Maleficio\nGenres: Drama\nOverview: \nTagline: \nCreator: Fernanda\
|
77 |
-
\ Villeli\nStars: Fernando Colunga, Marlene Favela, Sofía Castro\nRelease Date:\
|
78 |
-
\ 2023-11-13\nKeywords: "
|
79 |
-
- "Title: You Can Do Better\nGenres: Comedy\nOverview: A half-hour brain candy show\
|
80 |
-
\ that tackles major topics like drinking, technology, sex, money, and friends.\
|
81 |
-
\ Through a mix of sketch, how-to, man-on-the-street and expert interviews, our\
|
82 |
-
\ hosts impart tips and tricks that every adult should know. Viewers will learn\
|
83 |
-
\ to be better at the subjects no one teaches in school, and they'll get to belly-laugh\
|
84 |
-
\ along the way.\nTagline: \nCreator: \nStars: Abbi Crutchfield, Matthew Latkiewicz,\
|
85 |
-
\ Jessy Greer\nRelease Date: 2016-08-23\nKeywords: "
|
86 |
-
- "Title: Junjou Romantica\nGenres: Animation, Comedy, Drama\nOverview: Three couples,\
|
87 |
-
\ three intense romances: a student’s tutor crosses the line, a loner meets a\
|
88 |
-
\ force of nature, and a carefree man faces love he can’t ignore.\nTagline: \n\
|
89 |
-
Creator: Shungiku Nakamura\nStars: Hikaru Hanada, Takahiro Sakurai, Nobutoshi\
|
90 |
-
\ Canna\nRelease Date: 2008-04-10\nKeywords: college, romance, slice of life,\
|
91 |
-
\ coming of age, based on manga, art, teacher student relationship, lgbt, angst,\
|
92 |
-
\ anime, drastic change of life, erotic, gay theme, tsundere, boys' love (bl)"
|
93 |
-
- source_sentence: Compelling dramas exploring the repercussions of past actions
|
94 |
-
sentences:
|
95 |
-
- 'Title: Stay Close
|
96 |
-
|
97 |
-
Genres: Drama, Crime, Mystery
|
98 |
-
|
99 |
-
Overview: When Carlton Flynn vanishes 17 years to the night after Stewart Green
|
100 |
-
did, it sets off a chain reaction in the lives of people connected to both men.
|
101 |
-
|
102 |
-
Tagline: Everyone has secrets.
|
103 |
-
|
104 |
-
Creator: Harlan Coben
|
105 |
-
|
106 |
-
Stars: Cush Jumbo, James Nesbitt, Richard Armitage
|
107 |
-
|
108 |
-
Release Date: 2021-12-31
|
109 |
-
|
110 |
-
Keywords: suicide, detective, celebrity, reporter, husband, dark'
|
111 |
-
- "Title: Los misterios de Laura\nGenres: Crime, Drama, Mystery\nOverview: \nTagline:\
|
112 |
-
\ \nCreator: Javier Holgado, Carlos Vila\nStars: María Pujalte, Fernando Guillén\
|
113 |
-
\ Cuervo, César Camino\nRelease Date: 2009-07-27\nKeywords: investigation, investigator,\
|
114 |
-
\ crime investigation"
|
115 |
-
- "Title: Hitori no Shita: The Outcast\nGenres: Animation, Sci-Fi & Fantasy, Action\
|
116 |
-
\ & Adventure, Comedy\nOverview: Zhang Chulan leads a very common college student's\
|
117 |
-
\ life until he finds himself caught up in a terrible incident that happened in\
|
118 |
-
\ a small village. As he was walking through a graveyard, he is assaulted by zombies.\
|
119 |
-
\ Thinking that it was over for him, a mysterious girl carrying a sword suddenly\
|
120 |
-
\ saves him and disappears.\nTagline: \nCreator: Dong Man Tang, Mi Er\nStars:\
|
121 |
-
\ Xiao Liansha, Sheng Feng, Yuntu Cao\nRelease Date: 2016-07-09\nKeywords: fighting,\
|
122 |
-
\ advanture, city, based on manhua, fantasy, urban fantasy, sino japanese production,\
|
123 |
-
\ passionate, donghua, comedy, coproduction, urban adventure, qihuan, dongfang"
|
124 |
-
- source_sentence: Memorable drama TV series focused on slight romance and grappling
|
125 |
-
with investigation
|
126 |
-
sentences:
|
127 |
-
- "Title: Reset\nGenres: Drama, Mystery\nOverview: The lives of a college student\
|
128 |
-
\ and a video game designer are kept being reset after an explosion on a bus.\
|
129 |
-
\ During each reset, they have to work together to find out what the reason for\
|
130 |
-
\ the explosion is. Will these two be able to save themselves and their fellow\
|
131 |
-
\ passengers? Will they be able to close the time-loop?\nTagline: \nCreator: \n\
|
132 |
-
Stars: Bai Jingting, Zhao Jinmai, Liu Tao\nRelease Date: 2022-01-11\nKeywords:\
|
133 |
-
\ time travel, investigation, time loop, explosion, slight romance, student, suspense"
|
134 |
-
- "Title: The Boss\nGenres: Comedy, Drama\nOverview: Eliseo is the superintendent\
|
135 |
-
\ of an upscale building. On the surface, is cordial and docile in his role, but\
|
136 |
-
\ underneath Eliseo believes himself the omnipotent figure of the community —\
|
137 |
-
\ meddling in the affairs of residents and pulling strings as he sees fit. Eliseo's\
|
138 |
-
\ only concern is protecting his job, which comes under threat by a proposed pool\
|
139 |
-
\ project.\nTagline: \nCreator: Mariano Cohn, Gastón Duprat\nStars: Gastón Cocchiarale,\
|
140 |
-
\ Guillermo Francella, Gabriel Goity\nRelease Date: 2022-10-26\nKeywords: manipulation,\
|
141 |
-
\ buenos aires, argentina, apartment building, scheming, serie argentina, building\
|
142 |
-
\ superintendent"
|
143 |
-
- "Title: Wildlife Specials\nGenres: Documentary\nOverview: The BBC Wildlife Specials\
|
144 |
-
\ are a series of nature documentary programmes commissioned by BBC Television.\
|
145 |
-
\ The Wildlife Specials began with a pilot episode in 1995. 20 programmes have\
|
146 |
-
\ been made to date, with three of the recent ones being in multi parts. The earlier\
|
147 |
-
\ programmes were produced in-house by the BBC's specialist Natural History Unit,\
|
148 |
-
\ but the more recent Spy in the... titles were made by the independent John Downer\
|
149 |
-
\ Productions. The first 18 programmes, up to 2008, were narrated by David Attenborough.\
|
150 |
-
\ The most recent two were narrated by David Tennant.\n\n\"The world's leading\
|
151 |
-
\ natural history filmmakers meet the world's most charismatic animals\"\n\n—\
|
152 |
-
\ BBC tagline\nTagline: \nCreator: \nStars: David Attenborough\nRelease Date:\
|
153 |
-
\ 1995-04-14\nKeywords: animals, nature documentary, cats"
|
154 |
-
pipeline_tag: sentence-similarity
|
155 |
library_name: sentence-transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
156 |
---
|
157 |
|
158 |
-
#
|
159 |
|
160 |
-
|
161 |
|
162 |
-
|
163 |
|
164 |
-
|
165 |
-
-
|
166 |
-
-
|
167 |
-
-
|
168 |
-
- **Output Dimensionality:** 384 dimensions
|
169 |
-
- **Similarity Function:** Cosine Similarity
|
170 |
-
<!-- - **Training Dataset:** Unknown -->
|
171 |
-
<!-- - **Language:** Unknown -->
|
172 |
-
<!-- - **License:** Unknown -->
|
173 |
|
174 |
-
### Model Sources
|
175 |
|
176 |
-
|
177 |
-
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
|
178 |
-
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
|
179 |
|
180 |
-
|
|
|
|
|
|
|
181 |
|
182 |
-
```
|
183 |
-
SentenceTransformer(
|
184 |
-
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
|
185 |
-
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
186 |
-
(2): Normalize()
|
187 |
-
)
|
188 |
-
```
|
189 |
|
190 |
-
##
|
191 |
|
192 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
193 |
|
194 |
-
|
|
|
|
|
|
|
195 |
|
196 |
-
```bash
|
197 |
-
pip install -U sentence-transformers
|
198 |
-
```
|
199 |
|
200 |
-
|
|
|
201 |
```python
|
202 |
from sentence_transformers import SentenceTransformer
|
203 |
|
204 |
-
|
205 |
-
|
206 |
-
# Run inference
|
207 |
-
sentences = [
|
208 |
-
'Memorable drama TV series focused on slight romance and grappling with investigation',
|
209 |
-
'Title: Reset\nGenres: Drama, Mystery\nOverview: The lives of a college student and a video game designer are kept being reset after an explosion on a bus. During each reset, they have to work together to find out what the reason for the explosion is. Will these two be able to save themselves and their fellow passengers? Will they be able to close the time-loop?\nTagline: \nCreator: \nStars: Bai Jingting, Zhao Jinmai, Liu Tao\nRelease Date: 2022-01-11\nKeywords: time travel, investigation, time loop, explosion, slight romance, student, suspense',
|
210 |
-
"Title: The Boss\nGenres: Comedy, Drama\nOverview: Eliseo is the superintendent of an upscale building. On the surface, is cordial and docile in his role, but underneath Eliseo believes himself the omnipotent figure of the community — meddling in the affairs of residents and pulling strings as he sees fit. Eliseo's only concern is protecting his job, which comes under threat by a proposed pool project.\nTagline: \nCreator: Mariano Cohn, Gastón Duprat\nStars: Gastón Cocchiarale, Guillermo Francella, Gabriel Goity\nRelease Date: 2022-10-26\nKeywords: manipulation, buenos aires, argentina, apartment building, scheming, serie argentina, building superintendent",
|
211 |
-
]
|
212 |
-
embeddings = model.encode(sentences)
|
213 |
-
print(embeddings.shape)
|
214 |
-
# [3, 384]
|
215 |
-
|
216 |
-
# Get the similarity scores for the embeddings
|
217 |
-
similarities = model.similarity(embeddings, embeddings)
|
218 |
-
print(similarities.shape)
|
219 |
-
# [3, 3]
|
220 |
-
```
|
221 |
-
|
222 |
-
<!--
|
223 |
-
### Direct Usage (Transformers)
|
224 |
-
|
225 |
-
<details><summary>Click to see the direct usage in Transformers</summary>
|
226 |
-
|
227 |
-
</details>
|
228 |
-
-->
|
229 |
-
|
230 |
-
<!--
|
231 |
-
### Downstream Usage (Sentence Transformers)
|
232 |
-
|
233 |
-
You can finetune this model on your own dataset.
|
234 |
-
|
235 |
-
<details><summary>Click to expand</summary>
|
236 |
-
|
237 |
-
</details>
|
238 |
-
-->
|
239 |
-
|
240 |
-
<!--
|
241 |
-
### Out-of-Scope Use
|
242 |
-
|
243 |
-
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
244 |
-
-->
|
245 |
-
|
246 |
-
<!--
|
247 |
-
## Bias, Risks and Limitations
|
248 |
-
|
249 |
-
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
|
250 |
-
-->
|
251 |
-
|
252 |
-
<!--
|
253 |
-
### Recommendations
|
254 |
-
|
255 |
-
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
|
256 |
-
-->
|
257 |
-
|
258 |
-
## Training Details
|
259 |
-
|
260 |
-
### Training Dataset
|
261 |
-
|
262 |
-
#### Unnamed Dataset
|
263 |
-
|
264 |
-
* Size: 32,394 training samples
|
265 |
-
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
|
266 |
-
* Approximate statistics based on the first 1000 samples:
|
267 |
-
| | sentence_0 | sentence_1 | sentence_2 |
|
268 |
-
|:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
|
269 |
-
| type | string | string | string |
|
270 |
-
| details | <ul><li>min: 8 tokens</li><li>mean: 17.07 tokens</li><li>max: 38 tokens</li></ul> | <ul><li>min: 38 tokens</li><li>mean: 133.54 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 40 tokens</li><li>mean: 132.86 tokens</li><li>max: 256 tokens</li></ul> |
|
271 |
-
* Samples:
|
272 |
-
| sentence_0 | sentence_1 | sentence_2 |
|
273 |
-
|:-----------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
274 |
-
| <code>Dramatic fantasy romance with a touch of destiny and betrayal</code> | <code>Title: Eternal Love<br>Genres: Drama, Sci-Fi & Fantasy<br>Overview: Three hundred years ago, Bai Qian stood on the Zhu Xian Terrace, turned around and jumped off without regret. Ye Hua stood by the bronze mirror to witness with his own eyes her death. Three hundred years later, in the East Sea Dragon Palace, the two meet unexpectedly. Another lifetime another world, after suffering betrayal Bai Qian no longer feels anything, yet she can't seem to comprehend Ye Hua's actions. Three lives three worlds, her and him, are they fated to love again?<br>Tagline: <br>Creator: <br>Stars: Yang Mi, Mark Chao, Ken Chang Tzu-Yao<br>Release Date: 2017-01-30<br>Keywords: china, arranged marriage, romance, fate, second chance, older woman younger man relationship, xianxia</code> | <code>Title: Kidsongs<br>Genres: Comedy<br>Overview: Kidsongs is an American children's media franchise which includes Kidsongs Music Video Stories on DVD and video, The Kidsongs TV Show, CDs of favorite children’s songs and covers of oldies and pop hits from the 50s, 60s and 70s, song books, sheet music, toys and an ecommerce website. Kidsongs was created by producer/writer Carol Rosenstein and director Bruce Gowers of Together Again Video Productions, both of whom are music video and television production veterans. The duo had produced and directed over 100 music videos for Warner Brothers Records and took their idea of music videos for children to the record label. Warner Brothers funded the first video, “A Day at Old MacDonald’s Farm”. Shortly thereafter, a three way partnership between TAVP, WBR and View-Master Video was formed with TAVP being responsible for production and WBR and View-Master responsible for distribution to video and music stores, and toy stores respectively.<br>Tagline: <br>Creat...</code> |
|
275 |
-
| <code>Memorable animation TV shows focused on cartoon and grappling with superliga</code> | <code>Title: Supa Strikas<br>Genres: Animation<br>Overview: With dreams of becoming Super League champions, a talented striker named Shakes and his football team take on rivals while going on global adventures.<br>Tagline: <br>Creator: <br>Stars: Corny Rempel, Kevin Aichele, Chelsea Rankin<br>Release Date: 2009-02-15<br>Keywords: cartoon, football (soccer), superliga</code> | <code>Title: Grand Hotel<br>Genres: Drama, Crime, Mystery<br>Overview: Santiago Mendoza owns last family-owned hotel in multicultural Miami Beach, while his glamorous second wife, Gigi, and their adult children enjoy the spoils of success.<br>Tagline: Five star hotel. Five star secrets.<br>Creator: Brian Tanen<br>Stars: Demián Bichir, Roselyn Sánchez, Denyse Tontz<br>Release Date: 2019-06-17<br>Keywords: miami, florida, hotel, remake, family conflict, upstairs downstairs, wealthy family</code> |
|
276 |
-
| <code>Any recommendations for top action & adventure TV programs from 2010 featuring Catherine Siachoque?</code> | <code>Title: Missing<br>Genres: Mystery, Action & Adventure, Crime<br>Overview: The night Elisa’s cousins-Santiago, Flor, and Eduardo, invited her to a nightclub and after a great deal of begging her parents allowed her go. When Danna and he sister-in-law Cecilia went to pick them up, all of them started showing up except for Elisa. As the hours passed, her parent grew more and more desperate and it was then when they decided to call the police and file a missing report.<br>Tagline: <br>Creator: <br>Stars: Sonya Smith, Catherine Siachoque, Jesus Licciardello<br>Release Date: 2010-03-08<br>Keywords: </code> | <code>Title: Aurora<br>Genres: Mystery, Soap, Drama, Crime<br>Overview: Having been cryogenically frozen for 20 years, Aurora's heart torn between past and present : memories of an old love and chance of a new one.<br>Tagline: <br>Creator: Marcela Citterio<br>Stars: Sara Maldonado, Eugenio Siller, Sonya Smith<br>Release Date: 2010-11-01<br>Keywords: </code> |
|
277 |
-
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
278 |
-
```json
|
279 |
-
{
|
280 |
-
"scale": 20.0,
|
281 |
-
"similarity_fct": "cos_sim"
|
282 |
-
}
|
283 |
-
```
|
284 |
-
|
285 |
-
### Training Hyperparameters
|
286 |
-
#### Non-Default Hyperparameters
|
287 |
-
|
288 |
-
- `per_device_train_batch_size`: 32
|
289 |
-
- `per_device_eval_batch_size`: 32
|
290 |
-
- `num_train_epochs`: 4
|
291 |
-
- `multi_dataset_batch_sampler`: round_robin
|
292 |
-
|
293 |
-
#### All Hyperparameters
|
294 |
-
<details><summary>Click to expand</summary>
|
295 |
-
|
296 |
-
- `overwrite_output_dir`: False
|
297 |
-
- `do_predict`: False
|
298 |
-
- `eval_strategy`: no
|
299 |
-
- `prediction_loss_only`: True
|
300 |
-
- `per_device_train_batch_size`: 32
|
301 |
-
- `per_device_eval_batch_size`: 32
|
302 |
-
- `per_gpu_train_batch_size`: None
|
303 |
-
- `per_gpu_eval_batch_size`: None
|
304 |
-
- `gradient_accumulation_steps`: 1
|
305 |
-
- `eval_accumulation_steps`: None
|
306 |
-
- `torch_empty_cache_steps`: None
|
307 |
-
- `learning_rate`: 5e-05
|
308 |
-
- `weight_decay`: 0.0
|
309 |
-
- `adam_beta1`: 0.9
|
310 |
-
- `adam_beta2`: 0.999
|
311 |
-
- `adam_epsilon`: 1e-08
|
312 |
-
- `max_grad_norm`: 1
|
313 |
-
- `num_train_epochs`: 4
|
314 |
-
- `max_steps`: -1
|
315 |
-
- `lr_scheduler_type`: linear
|
316 |
-
- `lr_scheduler_kwargs`: {}
|
317 |
-
- `warmup_ratio`: 0.0
|
318 |
-
- `warmup_steps`: 0
|
319 |
-
- `log_level`: passive
|
320 |
-
- `log_level_replica`: warning
|
321 |
-
- `log_on_each_node`: True
|
322 |
-
- `logging_nan_inf_filter`: True
|
323 |
-
- `save_safetensors`: True
|
324 |
-
- `save_on_each_node`: False
|
325 |
-
- `save_only_model`: False
|
326 |
-
- `restore_callback_states_from_checkpoint`: False
|
327 |
-
- `no_cuda`: False
|
328 |
-
- `use_cpu`: False
|
329 |
-
- `use_mps_device`: False
|
330 |
-
- `seed`: 42
|
331 |
-
- `data_seed`: None
|
332 |
-
- `jit_mode_eval`: False
|
333 |
-
- `use_ipex`: False
|
334 |
-
- `bf16`: False
|
335 |
-
- `fp16`: False
|
336 |
-
- `fp16_opt_level`: O1
|
337 |
-
- `half_precision_backend`: auto
|
338 |
-
- `bf16_full_eval`: False
|
339 |
-
- `fp16_full_eval`: False
|
340 |
-
- `tf32`: None
|
341 |
-
- `local_rank`: 0
|
342 |
-
- `ddp_backend`: None
|
343 |
-
- `tpu_num_cores`: None
|
344 |
-
- `tpu_metrics_debug`: False
|
345 |
-
- `debug`: []
|
346 |
-
- `dataloader_drop_last`: False
|
347 |
-
- `dataloader_num_workers`: 0
|
348 |
-
- `dataloader_prefetch_factor`: None
|
349 |
-
- `past_index`: -1
|
350 |
-
- `disable_tqdm`: False
|
351 |
-
- `remove_unused_columns`: True
|
352 |
-
- `label_names`: None
|
353 |
-
- `load_best_model_at_end`: False
|
354 |
-
- `ignore_data_skip`: False
|
355 |
-
- `fsdp`: []
|
356 |
-
- `fsdp_min_num_params`: 0
|
357 |
-
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
|
358 |
-
- `tp_size`: 0
|
359 |
-
- `fsdp_transformer_layer_cls_to_wrap`: None
|
360 |
-
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
|
361 |
-
- `deepspeed`: None
|
362 |
-
- `label_smoothing_factor`: 0.0
|
363 |
-
- `optim`: adamw_torch
|
364 |
-
- `optim_args`: None
|
365 |
-
- `adafactor`: False
|
366 |
-
- `group_by_length`: False
|
367 |
-
- `length_column_name`: length
|
368 |
-
- `ddp_find_unused_parameters`: None
|
369 |
-
- `ddp_bucket_cap_mb`: None
|
370 |
-
- `ddp_broadcast_buffers`: False
|
371 |
-
- `dataloader_pin_memory`: True
|
372 |
-
- `dataloader_persistent_workers`: False
|
373 |
-
- `skip_memory_metrics`: True
|
374 |
-
- `use_legacy_prediction_loop`: False
|
375 |
-
- `push_to_hub`: False
|
376 |
-
- `resume_from_checkpoint`: None
|
377 |
-
- `hub_model_id`: None
|
378 |
-
- `hub_strategy`: every_save
|
379 |
-
- `hub_private_repo`: None
|
380 |
-
- `hub_always_push`: False
|
381 |
-
- `gradient_checkpointing`: False
|
382 |
-
- `gradient_checkpointing_kwargs`: None
|
383 |
-
- `include_inputs_for_metrics`: False
|
384 |
-
- `include_for_metrics`: []
|
385 |
-
- `eval_do_concat_batches`: True
|
386 |
-
- `fp16_backend`: auto
|
387 |
-
- `push_to_hub_model_id`: None
|
388 |
-
- `push_to_hub_organization`: None
|
389 |
-
- `mp_parameters`:
|
390 |
-
- `auto_find_batch_size`: False
|
391 |
-
- `full_determinism`: False
|
392 |
-
- `torchdynamo`: None
|
393 |
-
- `ray_scope`: last
|
394 |
-
- `ddp_timeout`: 1800
|
395 |
-
- `torch_compile`: False
|
396 |
-
- `torch_compile_backend`: None
|
397 |
-
- `torch_compile_mode`: None
|
398 |
-
- `include_tokens_per_second`: False
|
399 |
-
- `include_num_input_tokens_seen`: False
|
400 |
-
- `neftune_noise_alpha`: None
|
401 |
-
- `optim_target_modules`: None
|
402 |
-
- `batch_eval_metrics`: False
|
403 |
-
- `eval_on_start`: False
|
404 |
-
- `use_liger_kernel`: False
|
405 |
-
- `eval_use_gather_object`: False
|
406 |
-
- `average_tokens_across_devices`: False
|
407 |
-
- `prompts`: None
|
408 |
-
- `batch_sampler`: batch_sampler
|
409 |
-
- `multi_dataset_batch_sampler`: round_robin
|
410 |
-
|
411 |
-
</details>
|
412 |
-
|
413 |
-
### Training Logs
|
414 |
-
| Epoch | Step | Training Loss |
|
415 |
-
|:------:|:----:|:-------------:|
|
416 |
-
| 0.4936 | 500 | 0.864 |
|
417 |
-
| 0.9872 | 1000 | 0.5835 |
|
418 |
-
| 1.4808 | 1500 | 0.4604 |
|
419 |
-
| 1.9743 | 2000 | 0.4476 |
|
420 |
-
| 2.4679 | 2500 | 0.3866 |
|
421 |
-
| 2.9615 | 3000 | 0.3688 |
|
422 |
-
| 3.4551 | 3500 | 0.3353 |
|
423 |
-
| 3.9487 | 4000 | 0.3385 |
|
424 |
-
|
425 |
-
|
426 |
-
### Framework Versions
|
427 |
-
- Python: 3.11.12
|
428 |
-
- Sentence Transformers: 3.4.1
|
429 |
-
- Transformers: 4.51.3
|
430 |
-
- PyTorch: 2.6.0+cu124
|
431 |
-
- Accelerate: 1.6.0
|
432 |
-
- Datasets: 3.5.1
|
433 |
-
- Tokenizers: 0.21.1
|
434 |
-
|
435 |
-
## Citation
|
436 |
-
|
437 |
-
### BibTeX
|
438 |
-
|
439 |
-
#### Sentence Transformers
|
440 |
-
```bibtex
|
441 |
-
@inproceedings{reimers-2019-sentence-bert,
|
442 |
-
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
|
443 |
-
author = "Reimers, Nils and Gurevych, Iryna",
|
444 |
-
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
|
445 |
-
month = "11",
|
446 |
-
year = "2019",
|
447 |
-
publisher = "Association for Computational Linguistics",
|
448 |
-
url = "https://arxiv.org/abs/1908.10084",
|
449 |
-
}
|
450 |
-
```
|
451 |
-
|
452 |
-
#### MultipleNegativesRankingLoss
|
453 |
-
```bibtex
|
454 |
-
@misc{henderson2017efficient,
|
455 |
-
title={Efficient Natural Language Response Suggestion for Smart Reply},
|
456 |
-
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
|
457 |
-
year={2017},
|
458 |
-
eprint={1705.00652},
|
459 |
-
archivePrefix={arXiv},
|
460 |
-
primaryClass={cs.CL}
|
461 |
-
}
|
462 |
```
|
463 |
|
464 |
-
<!--
|
465 |
-
## Glossary
|
466 |
|
467 |
-
|
468 |
-
-->
|
469 |
|
470 |
-
|
471 |
-
|
|
|
472 |
|
473 |
-
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
|
474 |
-
-->
|
475 |
|
476 |
-
|
477 |
-
## Model Card Contact
|
478 |
|
479 |
-
|
480 |
-
-->
|
|
|
1 |
---
|
2 |
+
license: apache-2.0
|
3 |
tags:
|
4 |
+
- retrieval
|
5 |
+
- tv-show-recommendation
|
6 |
- sentence-transformers
|
7 |
+
- semantic-search
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
library_name: sentence-transformers
|
9 |
+
model-index:
|
10 |
+
- name: fine-tuned movie retriever
|
11 |
+
results:
|
12 |
+
- task:
|
13 |
+
type: retrieval
|
14 |
+
name: Information Retrieval
|
15 |
+
metrics:
|
16 |
+
- name: Recall@1
|
17 |
+
type: recall
|
18 |
+
value: 0.454
|
19 |
+
- name: Recall@3
|
20 |
+
type: recall
|
21 |
+
value: 0.676
|
22 |
+
- name: Recall@5
|
23 |
+
type: recall
|
24 |
+
value: 0.730
|
25 |
+
- name: Recall@10
|
26 |
+
type: recall
|
27 |
+
value: 0.797
|
28 |
+
metrics:
|
29 |
+
- recall
|
30 |
+
base_model:
|
31 |
+
- sentence-transformers/all-MiniLM-L6-v2
|
32 |
---
|
33 |
|
34 |
+
# 🎬 Fine-Tuned TV Show Retriever (Rich Semantic & Metadata Queries + Smart Negatives)
|
35 |
|
36 |
+
[](https://huggingface.co/your-username/my-st-model)
|
37 |
|
38 |
+
This is a custom fine-tuned sentence-transformer model designed for movie and TV recommendation systems. Optimized for high-quality vector retrieval in a movie and TV show recommendation RAG pipeline. Fine-tuning was done using ~32K synthetic natural language queries across metadata and vibe-based prompts:
|
39 |
|
40 |
+
- Enriched vibe-style natural language queries (e.g., Emotionally powerful space exploration film with themes of love and sacrifice.)
|
41 |
+
- Metadata-based natural language queries (e.g., Any crime movies from the 1990s directed by Quentin Tarantino about heist?)
|
42 |
+
- Smarter negative sampling (genre contrast, theme mismatch, star-topic confusion)
|
43 |
+
- A dataset of over 32,000 triplets (query, positive doc, negative doc)
|
|
|
|
|
|
|
|
|
|
|
44 |
|
|
|
45 |
|
46 |
+
## 🧠 Training Details
|
|
|
|
|
47 |
|
48 |
+
- Base model: `sentence-transformers/all-MiniLM-L6-v2`
|
49 |
+
- Loss function: `MultipleNegativesRankingLoss`
|
50 |
+
- Epochs: 4
|
51 |
+
- Optimized for: top-k semantic retrieval in RAG systems
|
52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
|
54 |
+
## 📈 Evaluation: Fine-tuned vs Base Model
|
55 |
|
56 |
+
| Metric | Fine-Tuned Model Score | Base Model Score |
|
57 |
+
|-------------|:----------------------:|:----------------:|
|
58 |
+
| Recall@1 | 0.454 | 0.133 |
|
59 |
+
| Recall@3 | 0.676 | 0.230 |
|
60 |
+
| Recall@5 | 0.730 | 0.279 |
|
61 |
+
| Recall@10 | 0.797 | 0.349 |
|
62 |
+
| MMR | 0.583 | 0.207 |
|
63 |
|
64 |
+
**Evaluation setup**:
|
65 |
+
- Dataset: 3,600 held-out metadata and vibe-style natural queries
|
66 |
+
- Method: Top-k ranking using cosine similarity between query and positive documents
|
67 |
+
- Goal: Assess top-k retrieval quality in recommendation-like settings
|
68 |
|
|
|
|
|
|
|
69 |
|
70 |
+
## 📦 Usage
|
71 |
+
|
72 |
```python
|
73 |
from sentence_transformers import SentenceTransformer
|
74 |
|
75 |
+
model = SentenceTransformer("jjtsao/fine-tuned_tv_show_retriever")
|
76 |
+
query_embedding = model.encode("mind-bending sci-fi thrillers from the 2000s about identity")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
```
|
78 |
|
|
|
|
|
79 |
|
80 |
+
## 🔍 Ideal Use Cases
|
|
|
81 |
|
82 |
+
- RAG-style movie recommendation apps
|
83 |
+
- Semantic filtering of large movie catalogs
|
84 |
+
- Query-document reranking pipelines
|
85 |
|
|
|
|
|
86 |
|
87 |
+
## 📜 License
|
|
|
88 |
|
89 |
+
Apache 2.0 — open for personal and commercial use.
|
|