|
--- |
|
license: apache-2.0 |
|
tags: |
|
- retrieval |
|
- tv-show-recommendation |
|
- sentence-transformers |
|
- semantic-search |
|
library_name: sentence-transformers |
|
model-index: |
|
- name: fine-tuned movie retriever |
|
results: |
|
- task: |
|
type: retrieval |
|
name: Information Retrieval |
|
metrics: |
|
- name: Recall@1 |
|
type: recall |
|
value: 0.454 |
|
- name: Recall@3 |
|
type: recall |
|
value: 0.676 |
|
- name: Recall@5 |
|
type: recall |
|
value: 0.730 |
|
- name: Recall@10 |
|
type: recall |
|
value: 0.797 |
|
metrics: |
|
- recall |
|
base_model: |
|
- sentence-transformers/all-MiniLM-L6-v2 |
|
--- |
|
|
|
# π¬ Fine-Tuned TV Show Retriever (Rich Semantic & Metadata Queries + Smart Negatives) |
|
|
|
[](https://huggingface.co/your-username/my-st-model) |
|
|
|
This is a custom fine-tuned sentence-transformer model designed for movie and TV recommendation systems. Optimized for high-quality vector retrieval in a movie and TV show recommendation RAG pipeline. Fine-tuning was done using ~32K synthetic natural language queries across metadata and vibe-based prompts: |
|
|
|
- Enriched vibe-style natural language queries (e.g., Emotionally powerful space exploration film with themes of love and sacrifice.) |
|
- Metadata-based natural language queries (e.g., Any crime movies from the 1990s directed by Quentin Tarantino about heist?) |
|
- Smarter negative sampling (genre contrast, theme mismatch, star-topic confusion) |
|
- A dataset of over 32,000 triplets (query, positive doc, negative doc) |
|
|
|
|
|
## π§ Training Details |
|
|
|
- Base model: `sentence-transformers/all-MiniLM-L6-v2` |
|
- Loss function: `MultipleNegativesRankingLoss` |
|
- Epochs: 4 |
|
- Optimized for: top-k semantic retrieval in RAG systems |
|
|
|
|
|
## π Evaluation: Fine-tuned vs Base Model |
|
|
|
| Metric | Fine-Tuned Model Score | Base Model Score | |
|
|-------------|:----------------------:|:----------------:| |
|
| Recall@1 | 0.454 | 0.133 | |
|
| Recall@3 | 0.676 | 0.230 | |
|
| Recall@5 | 0.730 | 0.279 | |
|
| Recall@10 | 0.797 | 0.349 | |
|
| MRR | 0.583 | 0.207 | |
|
|
|
**Evaluation setup**: |
|
- Dataset: 3,600 held-out metadata and vibe-style natural queries |
|
- Method: Top-k ranking using cosine similarity between query and positive documents |
|
- Goal: Assess top-k retrieval quality in recommendation-like settings |
|
|
|
|
|
## π¦ Usage |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
|
|
model = SentenceTransformer("jjtsao/fine-tuned_tv_show_retriever") |
|
query_embedding = model.encode("mind-bending sci-fi thrillers from the 2000s about identity") |
|
``` |
|
|
|
|
|
## π Ideal Use Cases |
|
|
|
- RAG-style movie recommendation apps |
|
- Semantic filtering of large movie catalogs |
|
- Query-document reranking pipelines |
|
|
|
|
|
## π License |
|
|
|
Apache 2.0 β open for personal and commercial use. |