Spaces:
Running
Running
metadata
title: Multimodal Rag Hm
emoji: π
colorFrom: purple
colorTo: yellow
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
short_description: A simple Multimodal RAG on top of H&M fashion data
π Fashion Multimodal RAG Assistant
This project implements a complete multimodal RAG (Retrieval-Augmented Generation) pipeline that can search through fashion items using both text and image queries, then generate helpful responses using an LLM.
π Features
- Multimodal Search: Search for fashion items using either text descriptions or image uploads
- Vector Similarity: Powered by CLIP embeddings for high-quality similarity matching
- AI-Generated Recommendations: Get personalized fashion recommendations based on your search
- Interactive Web Interface: Easy-to-use Gradio interface for a seamless experience
π How It Works
The pipeline consists of three main phases:
- Retrieval: Finds similar fashion items using vector search with CLIP embeddings
- Augmentation: Creates enhanced prompts with retrieved context from the fashion database
- Generation: Generates helpful, creative responses using a fine-tuned LLM (Qwen2.5-0.5B-Instruct)
π Dataset
The project uses the H&M Fashion Caption Dataset:
- 20K+ fashion items with images and text descriptions
- Source: H&M Fashion Caption Dataset on HuggingFace
π§ Technical Details
- Vector Database: LanceDB for efficient similarity search
- Embedding Model: CLIP for multimodal embeddings
- LLM: Qwen/Qwen2.5-0.5B-Instruct for response generation
- Web Interface: Gradio for interactive user experience
π» Usage
You can interact with the application in two ways:
Web Interface
The app comes with a Gradio web interface for easy interaction:
python app.py --app
Command Line
You can also use the command line for specific queries:
# Text query
python app.py --query "black dress for evening"
# Image query (if you have an image file)
python app.py --query "path/to/fashion/image.jpg"
π οΈ Installation
To run this project locally:
- Clone the repository
- Install dependencies:
pip install -r requirements.txt
- Run the application:
python app.py --app
π License
This project uses the H&M Fashion Caption Dataset which is publicly available on HuggingFace.
π Acknowledgements
- H&M Fashion Dataset by tomytjandra
- Built with LanceDB, CLIP, and Qwen LLM
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference