multimodal_rag_Hm / README.md
dceshubh's picture
Add README and requirements.txt
06a1f7d
metadata
title: Multimodal Rag Hm
emoji: πŸ‘€
colorFrom: purple
colorTo: yellow
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
short_description: A simple Multimodal RAG on top of H&M fashion data

πŸ‘— Fashion Multimodal RAG Assistant

This project implements a complete multimodal RAG (Retrieval-Augmented Generation) pipeline that can search through fashion items using both text and image queries, then generate helpful responses using an LLM.

πŸ” Features

  • Multimodal Search: Search for fashion items using either text descriptions or image uploads
  • Vector Similarity: Powered by CLIP embeddings for high-quality similarity matching
  • AI-Generated Recommendations: Get personalized fashion recommendations based on your search
  • Interactive Web Interface: Easy-to-use Gradio interface for a seamless experience

πŸš€ How It Works

The pipeline consists of three main phases:

  1. Retrieval: Finds similar fashion items using vector search with CLIP embeddings
  2. Augmentation: Creates enhanced prompts with retrieved context from the fashion database
  3. Generation: Generates helpful, creative responses using a fine-tuned LLM (Qwen2.5-0.5B-Instruct)

πŸ“Š Dataset

The project uses the H&M Fashion Caption Dataset:

πŸ”§ Technical Details

  • Vector Database: LanceDB for efficient similarity search
  • Embedding Model: CLIP for multimodal embeddings
  • LLM: Qwen/Qwen2.5-0.5B-Instruct for response generation
  • Web Interface: Gradio for interactive user experience

πŸ’» Usage

You can interact with the application in two ways:

Web Interface

The app comes with a Gradio web interface for easy interaction:

python app.py --app

Command Line

You can also use the command line for specific queries:

# Text query
python app.py --query "black dress for evening"

# Image query (if you have an image file)
python app.py --query "path/to/fashion/image.jpg"

πŸ› οΈ Installation

To run this project locally:

  1. Clone the repository
  2. Install dependencies:
    pip install -r requirements.txt
    
  3. Run the application:
    python app.py --app
    

πŸ“ License

This project uses the H&M Fashion Caption Dataset which is publicly available on HuggingFace.

πŸ™ Acknowledgements

  • H&M Fashion Dataset by tomytjandra
  • Built with LanceDB, CLIP, and Qwen LLM

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference