dceshubh commited on
Commit
06a1f7d
Β·
1 Parent(s): 0cae3f6

Add README and requirements.txt

Browse files
Files changed (2) hide show
  1. README.md +75 -0
  2. requirements.txt +22 -0
README.md CHANGED
@@ -10,4 +10,79 @@ pinned: false
10
  short_description: A simple Multimodal RAG on top of H&M fashion data
11
  ---
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
10
  short_description: A simple Multimodal RAG on top of H&M fashion data
11
  ---
12
 
13
+ # πŸ‘— Fashion Multimodal RAG Assistant
14
+
15
+ This project implements a complete multimodal RAG (Retrieval-Augmented Generation) pipeline that can search through fashion items using both text and image queries, then generate helpful responses using an LLM.
16
+
17
+ ## πŸ” Features
18
+
19
+ - **Multimodal Search**: Search for fashion items using either text descriptions or image uploads
20
+ - **Vector Similarity**: Powered by CLIP embeddings for high-quality similarity matching
21
+ - **AI-Generated Recommendations**: Get personalized fashion recommendations based on your search
22
+ - **Interactive Web Interface**: Easy-to-use Gradio interface for a seamless experience
23
+
24
+ ## πŸš€ How It Works
25
+
26
+ The pipeline consists of three main phases:
27
+
28
+ 1. **Retrieval**: Finds similar fashion items using vector search with CLIP embeddings
29
+ 2. **Augmentation**: Creates enhanced prompts with retrieved context from the fashion database
30
+ 3. **Generation**: Generates helpful, creative responses using a fine-tuned LLM (Qwen2.5-0.5B-Instruct)
31
+
32
+ ## πŸ“Š Dataset
33
+
34
+ The project uses the H&M Fashion Caption Dataset:
35
+ - 20K+ fashion items with images and text descriptions
36
+ - Source: [H&M Fashion Caption Dataset on HuggingFace](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption)
37
+
38
+ ## πŸ”§ Technical Details
39
+
40
+ - **Vector Database**: LanceDB for efficient similarity search
41
+ - **Embedding Model**: CLIP for multimodal embeddings
42
+ - **LLM**: Qwen/Qwen2.5-0.5B-Instruct for response generation
43
+ - **Web Interface**: Gradio for interactive user experience
44
+
45
+ ## πŸ’» Usage
46
+
47
+ You can interact with the application in two ways:
48
+
49
+ ### Web Interface
50
+ The app comes with a Gradio web interface for easy interaction:
51
+ ```
52
+ python app.py --app
53
+ ```
54
+
55
+ ### Command Line
56
+ You can also use the command line for specific queries:
57
+ ```
58
+ # Text query
59
+ python app.py --query "black dress for evening"
60
+
61
+ # Image query (if you have an image file)
62
+ python app.py --query "path/to/fashion/image.jpg"
63
+ ```
64
+
65
+ ## πŸ› οΈ Installation
66
+
67
+ To run this project locally:
68
+
69
+ 1. Clone the repository
70
+ 2. Install dependencies:
71
+ ```
72
+ pip install -r requirements.txt
73
+ ```
74
+ 3. Run the application:
75
+ ```
76
+ python app.py --app
77
+ ```
78
+
79
+ ## πŸ“ License
80
+
81
+ This project uses the H&M Fashion Caption Dataset which is publicly available on HuggingFace.
82
+
83
+ ## πŸ™ Acknowledgements
84
+
85
+ - H&M Fashion Dataset by [tomytjandra](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption)
86
+ - Built with LanceDB, CLIP, and Qwen LLM
87
+
88
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
requirements.txt ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core dependencies
2
+ torch>=2.0.0
3
+ transformers>=4.30.0
4
+ datasets>=2.12.0
5
+ pandas>=2.0.0
6
+ Pillow>=9.5.0
7
+
8
+ # Database and embeddings
9
+ lancedb>=0.3.0
10
+ pydantic>=1.10.8
11
+
12
+ # Web interface
13
+ gradio>=3.35.0
14
+
15
+ # Utilities
16
+ numpy>=1.24.0
17
+ scikit-learn>=1.2.0
18
+ sentence-transformers>=2.2.2
19
+
20
+ # For clip embeddings
21
+ open-clip-torch>=2.20.0
22
+ ftfy>=6.1.1