Spaces:

dceshubh
/

multimodal_rag_Hm

Running

App Files Files Community

dceshubh commited on 1 day ago

Commit

06a1f7d

1 Parent(s): 0cae3f6

Add README and requirements.txt

Browse files

Files changed (2) hide show

README.md +75 -0
requirements.txt +22 -0

README.md CHANGED Viewed

@@ -10,4 +10,79 @@ pinned: false
 short_description: A simple Multimodal RAG on top of H&M fashion data
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 short_description: A simple Multimodal RAG on top of H&M fashion data
 ---
+# 👗 Fashion Multimodal RAG Assistant
+This project implements a complete multimodal RAG (Retrieval-Augmented Generation) pipeline that can search through fashion items using both text and image queries, then generate helpful responses using an LLM.
+## 🔍 Features
+- **Multimodal Search**: Search for fashion items using either text descriptions or image uploads
+- **Vector Similarity**: Powered by CLIP embeddings for high-quality similarity matching
+- **AI-Generated Recommendations**: Get personalized fashion recommendations based on your search
+- **Interactive Web Interface**: Easy-to-use Gradio interface for a seamless experience
+## 🚀 How It Works
+The pipeline consists of three main phases:
+1. **Retrieval**: Finds similar fashion items using vector search with CLIP embeddings
+2. **Augmentation**: Creates enhanced prompts with retrieved context from the fashion database
+3. **Generation**: Generates helpful, creative responses using a fine-tuned LLM (Qwen2.5-0.5B-Instruct)
+## 📊 Dataset
+The project uses the H&M Fashion Caption Dataset:
+- 20K+ fashion items with images and text descriptions
+- Source: [H&M Fashion Caption Dataset on HuggingFace](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption)
+## 🔧 Technical Details
+- **Vector Database**: LanceDB for efficient similarity search
+- **Embedding Model**: CLIP for multimodal embeddings
+- **LLM**: Qwen/Qwen2.5-0.5B-Instruct for response generation
+- **Web Interface**: Gradio for interactive user experience
+## 💻 Usage
+You can interact with the application in two ways:
+### Web Interface
+The app comes with a Gradio web interface for easy interaction:
+```
+python app.py --app
+```
+### Command Line
+You can also use the command line for specific queries:
+```
+# Text query
+python app.py --query "black dress for evening"
+# Image query (if you have an image file)
+python app.py --query "path/to/fashion/image.jpg"
+```
+## 🛠️ Installation
+To run this project locally:
+1. Clone the repository
+2. Install dependencies:
+   ```
+   pip install -r requirements.txt
+   ```
+3. Run the application:
+   ```
+   python app.py --app
+   ```
+## 📝 License
+This project uses the H&M Fashion Caption Dataset which is publicly available on HuggingFace.
+## 🙏 Acknowledgements
+- H&M Fashion Dataset by [tomytjandra](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption)
+- Built with LanceDB, CLIP, and Qwen LLM
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

requirements.txt ADDED Viewed

	@@ -0,0 +1,22 @@

+# Core dependencies
+torch>=2.0.0
+transformers>=4.30.0
+datasets>=2.12.0
+pandas>=2.0.0
+Pillow>=9.5.0
+# Database and embeddings
+lancedb>=0.3.0
+pydantic>=1.10.8
+# Web interface
+gradio>=3.35.0
+# Utilities
+numpy>=1.24.0
+scikit-learn>=1.2.0
+sentence-transformers>=2.2.2
+# For clip embeddings
+open-clip-torch>=2.20.0
+ftfy>=6.1.1