Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.50.0
metadata
title: Visual Search System
emoji: π
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.37.0
app_file: app.py
pinned: false
license: mit
π Visual Search System
A Streamlit app that downloads images from photos_url.csv
, builds lightweight visual embeddings, and lets you search by text (optional, via Hugging Face Inference API) or by uploading an image.
β¨ Features
- π₯ Automatic downloads: Pulls images from
photos_url.csv
with retries and optimization - π§ Embeddings: Creates simple, robust RGB histogram embeddings locally (no GPU needed)
- π€ Text search (optional): Uses
openai/clip-vit-base-patch32
via HF Inference API whenHF_TOKEN
is provided - π Image similarity search: Upload an image and find visually similar images
- π± Modern UI: Streamlit interface with responsive layout and status tracking
π Quick Start
Local Development
Clone the repository:
git clone <your-repo-url> cd visual-search-system
Install dependencies:
pip install -r requirements.txt
Run the app:
streamlit run app.py
Hugging Face Spaces Deployment
- Create a new Space and select the SDK:
Streamlit
. - Ensure this repository contains at least:
app.py
,photos_url.csv
,requirements.txt
,README.md
. - Optional: Set a Space secret named
HF_TOKEN
if you want text search enabled.- In your Space, go to Settings β Secrets β Add
HF_TOKEN
(a valid Hugging Face token).
- In your Space, go to Settings β Secrets β Add
- Push/Upload files. The build will install
requirements.txt
and startapp.py
automatically.
Notes:
- You do NOT need a Dockerfile for Streamlit Spaces (the metadata header in this README is sufficient).
- Without
HF_TOKEN
, the app still works with image upload search; text search will be disabled with a warning.
π Project Structure
visual-search-system/
βββ app.py # Main Streamlit application (entry point)
βββ download_images.py # Optional: standalone downloader utility
βββ photos_url.csv # Dataset with image URLs
βββ requirements.txt # Python dependencies
βββ README.md # This file (contains HF Spaces metadata)
βββ images/ # Downloaded images (created automatically)
π― How It Works
- On first run, the app reads
photos_url.csv
and downloads up to 250 images (configurable). - It creates local visual embeddings using RGB histograms and saves them to
embeddings/
. - In the UI you can:
- Perform text search (requires
HF_TOKEN
) againstopenai/clip-vit-base-patch32
via Inference API. - Upload an image to find visually similar images using cosine similarity over local embeddings.
- Perform text search (requires
π Dataset Information
This repository expects a photos_url.csv
with at least one column containing HTTP/HTTPS image URLs.
Images are stored as JPEG, optimized to ~800Γ800 pixels to balance quality and performance.
π οΈ Technical Details
Dependencies
streamlit
- web interfacepandas
- CSV handlingrequests
- HTTP downloadspillow
- image processingnumpy
- embeddings and similaritytqdm
- used bydownload_images.py
(optional utility)
Performance Features
- Parallel downloads with retries and exponential backoff
- Atomic writes for embedding/index files to avoid corruption
- Progress persisted to
progress.json
for resilience
π§ Configuration
Environment Variables
HF_TOKEN
(optional): Hugging Face token to enable text search via Inference API.
Customization (in app.py
)
MAX_IMAGES
: number of images to process (default 250)MAX_WORKERS
: parallel download workers (default 6)TARGET_MAX_SIZE
: image resize target (default 800Γ800)
π¨ Troubleshooting
Common Issues
Space fails to start (HF Spaces)
- Ensure the SDK in the Space is set to Streamlit and this README has the metadata block
- Confirm
app.py
andrequirements.txt
exist at the repo root
Image download failures
- Check internet connection
- Verify
photos_url.csv
is present - Check available disk space
- Reduce
MAX_WORKERS
if hitting rate limits
Text search not working
- Add
HF_TOKEN
as a Space secret - Ensure the CLIP model endpoint is reachable
- Add
Performance Tips
- Faster Downloads: Increase
max_workers
in download functions - Memory Usage: Reduce
MAX_DISPLAY_IMAGES
for lower memory usage - Image Quality: Adjust JPEG quality in
download_images.py
π License
This project is open source. Feel free to modify and distribute.
π€ Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
π Support
If you encounter issues:
- Check the troubleshooting section above
- Review the console output for error messages
- Ensure all required files are present
- Verify Python version compatibility
Built with β€οΈ using Streamlit and Python