Spaces:
Running
Running
Add README and requirements.txt
Browse files- README.md +75 -0
- requirements.txt +22 -0
README.md
CHANGED
@@ -10,4 +10,79 @@ pinned: false
|
|
10 |
short_description: A simple Multimodal RAG on top of H&M fashion data
|
11 |
---
|
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
10 |
short_description: A simple Multimodal RAG on top of H&M fashion data
|
11 |
---
|
12 |
|
13 |
+
# π Fashion Multimodal RAG Assistant
|
14 |
+
|
15 |
+
This project implements a complete multimodal RAG (Retrieval-Augmented Generation) pipeline that can search through fashion items using both text and image queries, then generate helpful responses using an LLM.
|
16 |
+
|
17 |
+
## π Features
|
18 |
+
|
19 |
+
- **Multimodal Search**: Search for fashion items using either text descriptions or image uploads
|
20 |
+
- **Vector Similarity**: Powered by CLIP embeddings for high-quality similarity matching
|
21 |
+
- **AI-Generated Recommendations**: Get personalized fashion recommendations based on your search
|
22 |
+
- **Interactive Web Interface**: Easy-to-use Gradio interface for a seamless experience
|
23 |
+
|
24 |
+
## π How It Works
|
25 |
+
|
26 |
+
The pipeline consists of three main phases:
|
27 |
+
|
28 |
+
1. **Retrieval**: Finds similar fashion items using vector search with CLIP embeddings
|
29 |
+
2. **Augmentation**: Creates enhanced prompts with retrieved context from the fashion database
|
30 |
+
3. **Generation**: Generates helpful, creative responses using a fine-tuned LLM (Qwen2.5-0.5B-Instruct)
|
31 |
+
|
32 |
+
## π Dataset
|
33 |
+
|
34 |
+
The project uses the H&M Fashion Caption Dataset:
|
35 |
+
- 20K+ fashion items with images and text descriptions
|
36 |
+
- Source: [H&M Fashion Caption Dataset on HuggingFace](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption)
|
37 |
+
|
38 |
+
## π§ Technical Details
|
39 |
+
|
40 |
+
- **Vector Database**: LanceDB for efficient similarity search
|
41 |
+
- **Embedding Model**: CLIP for multimodal embeddings
|
42 |
+
- **LLM**: Qwen/Qwen2.5-0.5B-Instruct for response generation
|
43 |
+
- **Web Interface**: Gradio for interactive user experience
|
44 |
+
|
45 |
+
## π» Usage
|
46 |
+
|
47 |
+
You can interact with the application in two ways:
|
48 |
+
|
49 |
+
### Web Interface
|
50 |
+
The app comes with a Gradio web interface for easy interaction:
|
51 |
+
```
|
52 |
+
python app.py --app
|
53 |
+
```
|
54 |
+
|
55 |
+
### Command Line
|
56 |
+
You can also use the command line for specific queries:
|
57 |
+
```
|
58 |
+
# Text query
|
59 |
+
python app.py --query "black dress for evening"
|
60 |
+
|
61 |
+
# Image query (if you have an image file)
|
62 |
+
python app.py --query "path/to/fashion/image.jpg"
|
63 |
+
```
|
64 |
+
|
65 |
+
## π οΈ Installation
|
66 |
+
|
67 |
+
To run this project locally:
|
68 |
+
|
69 |
+
1. Clone the repository
|
70 |
+
2. Install dependencies:
|
71 |
+
```
|
72 |
+
pip install -r requirements.txt
|
73 |
+
```
|
74 |
+
3. Run the application:
|
75 |
+
```
|
76 |
+
python app.py --app
|
77 |
+
```
|
78 |
+
|
79 |
+
## π License
|
80 |
+
|
81 |
+
This project uses the H&M Fashion Caption Dataset which is publicly available on HuggingFace.
|
82 |
+
|
83 |
+
## π Acknowledgements
|
84 |
+
|
85 |
+
- H&M Fashion Dataset by [tomytjandra](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption)
|
86 |
+
- Built with LanceDB, CLIP, and Qwen LLM
|
87 |
+
|
88 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
requirements.txt
ADDED
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Core dependencies
|
2 |
+
torch>=2.0.0
|
3 |
+
transformers>=4.30.0
|
4 |
+
datasets>=2.12.0
|
5 |
+
pandas>=2.0.0
|
6 |
+
Pillow>=9.5.0
|
7 |
+
|
8 |
+
# Database and embeddings
|
9 |
+
lancedb>=0.3.0
|
10 |
+
pydantic>=1.10.8
|
11 |
+
|
12 |
+
# Web interface
|
13 |
+
gradio>=3.35.0
|
14 |
+
|
15 |
+
# Utilities
|
16 |
+
numpy>=1.24.0
|
17 |
+
scikit-learn>=1.2.0
|
18 |
+
sentence-transformers>=2.2.2
|
19 |
+
|
20 |
+
# For clip embeddings
|
21 |
+
open-clip-torch>=2.20.0
|
22 |
+
ftfy>=6.1.1
|