Spaces:
Sleeping
Sleeping
title: Visual Search System | |
emoji: π | |
colorFrom: blue | |
colorTo: green | |
sdk: streamlit | |
sdk_version: "1.37.0" | |
app_file: app.py | |
pinned: false | |
license: mit | |
# π Visual Search System | |
A Streamlit app that downloads images from `photos_url.csv`, builds lightweight visual embeddings, and lets you search by text (optional, via Hugging Face Inference API) or by uploading an image. | |
## β¨ Features | |
- **π₯ Automatic downloads**: Pulls images from `photos_url.csv` with retries and optimization | |
- **π§ Embeddings**: Creates simple, robust RGB histogram embeddings locally (no GPU needed) | |
- **π€ Text search (optional)**: Uses `openai/clip-vit-base-patch32` via HF Inference API when `HF_TOKEN` is provided | |
- **π Image similarity search**: Upload an image and find visually similar images | |
- **π± Modern UI**: Streamlit interface with responsive layout and status tracking | |
## π Quick Start | |
### Local Development | |
1. **Clone the repository:** | |
```bash | |
git clone <your-repo-url> | |
cd visual-search-system | |
``` | |
2. **Install dependencies:** | |
```bash | |
pip install -r requirements.txt | |
``` | |
3. **Run the app:** | |
```bash | |
streamlit run app.py | |
``` | |
### Hugging Face Spaces Deployment | |
1. Create a new Space and select the SDK: `Streamlit`. | |
2. Ensure this repository contains at least: `app.py`, `photos_url.csv`, `requirements.txt`, `README.md`. | |
3. Optional: Set a Space secret named `HF_TOKEN` if you want text search enabled. | |
- In your Space, go to Settings β Secrets β Add `HF_TOKEN` (a valid Hugging Face token). | |
4. Push/Upload files. The build will install `requirements.txt` and start `app.py` automatically. | |
Notes: | |
- You do NOT need a Dockerfile for Streamlit Spaces (the metadata header in this README is sufficient). | |
- Without `HF_TOKEN`, the app still works with image upload search; text search will be disabled with a warning. | |
## π Project Structure | |
``` | |
visual-search-system/ | |
βββ app.py # Main Streamlit application (entry point) | |
βββ download_images.py # Optional: standalone downloader utility | |
βββ photos_url.csv # Dataset with image URLs | |
βββ requirements.txt # Python dependencies | |
βββ README.md # This file (contains HF Spaces metadata) | |
βββ images/ # Downloaded images (created automatically) | |
``` | |
## π― How It Works | |
1. On first run, the app reads `photos_url.csv` and downloads up to 250 images (configurable). | |
2. It creates local visual embeddings using RGB histograms and saves them to `embeddings/`. | |
3. In the UI you can: | |
- Perform text search (requires `HF_TOKEN`) against `openai/clip-vit-base-patch32` via Inference API. | |
- Upload an image to find visually similar images using cosine similarity over local embeddings. | |
## π Dataset Information | |
This repository expects a `photos_url.csv` with at least one column containing HTTP/HTTPS image URLs. | |
Images are stored as JPEG, optimized to ~800Γ800 pixels to balance quality and performance. | |
## π οΈ Technical Details | |
### Dependencies | |
- `streamlit` - web interface | |
- `pandas` - CSV handling | |
- `requests` - HTTP downloads | |
- `pillow` - image processing | |
- `numpy` - embeddings and similarity | |
- `tqdm` - used by `download_images.py` (optional utility) | |
### Performance Features | |
- Parallel downloads with retries and exponential backoff | |
- Atomic writes for embedding/index files to avoid corruption | |
- Progress persisted to `progress.json` for resilience | |
## π§ Configuration | |
### Environment Variables | |
- `HF_TOKEN` (optional): Hugging Face token to enable text search via Inference API. | |
### Customization (in `app.py`) | |
- `MAX_IMAGES`: number of images to process (default 250) | |
- `MAX_WORKERS`: parallel download workers (default 6) | |
- `TARGET_MAX_SIZE`: image resize target (default 800Γ800) | |
## π¨ Troubleshooting | |
### Common Issues | |
1. **Space fails to start (HF Spaces)** | |
- Ensure the SDK in the Space is set to Streamlit and this README has the metadata block | |
- Confirm `app.py` and `requirements.txt` exist at the repo root | |
2. **Image download failures** | |
- Check internet connection | |
- Verify `photos_url.csv` is present | |
- Check available disk space | |
- Reduce `MAX_WORKERS` if hitting rate limits | |
3. **Text search not working** | |
- Add `HF_TOKEN` as a Space secret | |
- Ensure the CLIP model endpoint is reachable | |
### Performance Tips | |
- **Faster Downloads**: Increase `max_workers` in download functions | |
- **Memory Usage**: Reduce `MAX_DISPLAY_IMAGES` for lower memory usage | |
- **Image Quality**: Adjust JPEG quality in `download_images.py` | |
## π License | |
This project is open source. Feel free to modify and distribute. | |
## π€ Contributing | |
1. Fork the repository | |
2. Create a feature branch | |
3. Make your changes | |
4. Submit a pull request | |
## π Support | |
If you encounter issues: | |
1. Check the troubleshooting section above | |
2. Review the console output for error messages | |
3. Ensure all required files are present | |
4. Verify Python version compatibility | |
--- | |
**Built with β€οΈ using Streamlit and Python** |