manisharma494's picture
Update README.md
fdd82d2 verified

A newer version of the Streamlit SDK is available: 1.50.0

Upgrade
metadata
title: Visual Search System
emoji: πŸ”
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.37.0
app_file: app.py
pinned: false
license: mit

πŸ” Visual Search System

A Streamlit app that downloads images from photos_url.csv, builds lightweight visual embeddings, and lets you search by text (optional, via Hugging Face Inference API) or by uploading an image.

✨ Features

  • πŸ“₯ Automatic downloads: Pulls images from photos_url.csv with retries and optimization
  • 🧠 Embeddings: Creates simple, robust RGB histogram embeddings locally (no GPU needed)
  • πŸ”€ Text search (optional): Uses openai/clip-vit-base-patch32 via HF Inference API when HF_TOKEN is provided
  • πŸ“ Image similarity search: Upload an image and find visually similar images
  • πŸ“± Modern UI: Streamlit interface with responsive layout and status tracking

πŸš€ Quick Start

Local Development

  1. Clone the repository:

    git clone <your-repo-url>
    cd visual-search-system
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Run the app:

    streamlit run app.py
    

Hugging Face Spaces Deployment

  1. Create a new Space and select the SDK: Streamlit.
  2. Ensure this repository contains at least: app.py, photos_url.csv, requirements.txt, README.md.
  3. Optional: Set a Space secret named HF_TOKEN if you want text search enabled.
    • In your Space, go to Settings β†’ Secrets β†’ Add HF_TOKEN (a valid Hugging Face token).
  4. Push/Upload files. The build will install requirements.txt and start app.py automatically.

Notes:

  • You do NOT need a Dockerfile for Streamlit Spaces (the metadata header in this README is sufficient).
  • Without HF_TOKEN, the app still works with image upload search; text search will be disabled with a warning.

πŸ“ Project Structure

visual-search-system/
β”œβ”€β”€ app.py                 # Main Streamlit application (entry point)
β”œβ”€β”€ download_images.py     # Optional: standalone downloader utility
β”œβ”€β”€ photos_url.csv         # Dataset with image URLs
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md              # This file (contains HF Spaces metadata)
└── images/                # Downloaded images (created automatically)

🎯 How It Works

  1. On first run, the app reads photos_url.csv and downloads up to 250 images (configurable).
  2. It creates local visual embeddings using RGB histograms and saves them to embeddings/.
  3. In the UI you can:
    • Perform text search (requires HF_TOKEN) against openai/clip-vit-base-patch32 via Inference API.
    • Upload an image to find visually similar images using cosine similarity over local embeddings.

πŸ“Š Dataset Information

This repository expects a photos_url.csv with at least one column containing HTTP/HTTPS image URLs. Images are stored as JPEG, optimized to ~800Γ—800 pixels to balance quality and performance.

πŸ› οΈ Technical Details

Dependencies

  • streamlit - web interface
  • pandas - CSV handling
  • requests - HTTP downloads
  • pillow - image processing
  • numpy - embeddings and similarity
  • tqdm - used by download_images.py (optional utility)

Performance Features

  • Parallel downloads with retries and exponential backoff
  • Atomic writes for embedding/index files to avoid corruption
  • Progress persisted to progress.json for resilience

πŸ”§ Configuration

Environment Variables

  • HF_TOKEN (optional): Hugging Face token to enable text search via Inference API.

Customization (in app.py)

  • MAX_IMAGES: number of images to process (default 250)
  • MAX_WORKERS: parallel download workers (default 6)
  • TARGET_MAX_SIZE: image resize target (default 800Γ—800)

🚨 Troubleshooting

Common Issues

  1. Space fails to start (HF Spaces)

    • Ensure the SDK in the Space is set to Streamlit and this README has the metadata block
    • Confirm app.py and requirements.txt exist at the repo root
  2. Image download failures

    • Check internet connection
    • Verify photos_url.csv is present
    • Check available disk space
    • Reduce MAX_WORKERS if hitting rate limits
  3. Text search not working

    • Add HF_TOKEN as a Space secret
    • Ensure the CLIP model endpoint is reachable

Performance Tips

  • Faster Downloads: Increase max_workers in download functions
  • Memory Usage: Reduce MAX_DISPLAY_IMAGES for lower memory usage
  • Image Quality: Adjust JPEG quality in download_images.py

πŸ“ License

This project is open source. Feel free to modify and distribute.

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

πŸ“ž Support

If you encounter issues:

  1. Check the troubleshooting section above
  2. Review the console output for error messages
  3. Ensure all required files are present
  4. Verify Python version compatibility

Built with ❀️ using Streamlit and Python