manisharma494's picture
Update README.md
fdd82d2 verified
---
title: Visual Search System
emoji: πŸ”
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: "1.37.0"
app_file: app.py
pinned: false
license: mit
---
# πŸ” Visual Search System
A Streamlit app that downloads images from `photos_url.csv`, builds lightweight visual embeddings, and lets you search by text (optional, via Hugging Face Inference API) or by uploading an image.
## ✨ Features
- **πŸ“₯ Automatic downloads**: Pulls images from `photos_url.csv` with retries and optimization
- **🧠 Embeddings**: Creates simple, robust RGB histogram embeddings locally (no GPU needed)
- **πŸ”€ Text search (optional)**: Uses `openai/clip-vit-base-patch32` via HF Inference API when `HF_TOKEN` is provided
- **πŸ“ Image similarity search**: Upload an image and find visually similar images
- **πŸ“± Modern UI**: Streamlit interface with responsive layout and status tracking
## πŸš€ Quick Start
### Local Development
1. **Clone the repository:**
```bash
git clone <your-repo-url>
cd visual-search-system
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **Run the app:**
```bash
streamlit run app.py
```
### Hugging Face Spaces Deployment
1. Create a new Space and select the SDK: `Streamlit`.
2. Ensure this repository contains at least: `app.py`, `photos_url.csv`, `requirements.txt`, `README.md`.
3. Optional: Set a Space secret named `HF_TOKEN` if you want text search enabled.
- In your Space, go to Settings β†’ Secrets β†’ Add `HF_TOKEN` (a valid Hugging Face token).
4. Push/Upload files. The build will install `requirements.txt` and start `app.py` automatically.
Notes:
- You do NOT need a Dockerfile for Streamlit Spaces (the metadata header in this README is sufficient).
- Without `HF_TOKEN`, the app still works with image upload search; text search will be disabled with a warning.
## πŸ“ Project Structure
```
visual-search-system/
β”œβ”€β”€ app.py # Main Streamlit application (entry point)
β”œβ”€β”€ download_images.py # Optional: standalone downloader utility
β”œβ”€β”€ photos_url.csv # Dataset with image URLs
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # This file (contains HF Spaces metadata)
└── images/ # Downloaded images (created automatically)
```
## 🎯 How It Works
1. On first run, the app reads `photos_url.csv` and downloads up to 250 images (configurable).
2. It creates local visual embeddings using RGB histograms and saves them to `embeddings/`.
3. In the UI you can:
- Perform text search (requires `HF_TOKEN`) against `openai/clip-vit-base-patch32` via Inference API.
- Upload an image to find visually similar images using cosine similarity over local embeddings.
## πŸ“Š Dataset Information
This repository expects a `photos_url.csv` with at least one column containing HTTP/HTTPS image URLs.
Images are stored as JPEG, optimized to ~800Γ—800 pixels to balance quality and performance.
## πŸ› οΈ Technical Details
### Dependencies
- `streamlit` - web interface
- `pandas` - CSV handling
- `requests` - HTTP downloads
- `pillow` - image processing
- `numpy` - embeddings and similarity
- `tqdm` - used by `download_images.py` (optional utility)
### Performance Features
- Parallel downloads with retries and exponential backoff
- Atomic writes for embedding/index files to avoid corruption
- Progress persisted to `progress.json` for resilience
## πŸ”§ Configuration
### Environment Variables
- `HF_TOKEN` (optional): Hugging Face token to enable text search via Inference API.
### Customization (in `app.py`)
- `MAX_IMAGES`: number of images to process (default 250)
- `MAX_WORKERS`: parallel download workers (default 6)
- `TARGET_MAX_SIZE`: image resize target (default 800Γ—800)
## 🚨 Troubleshooting
### Common Issues
1. **Space fails to start (HF Spaces)**
- Ensure the SDK in the Space is set to Streamlit and this README has the metadata block
- Confirm `app.py` and `requirements.txt` exist at the repo root
2. **Image download failures**
- Check internet connection
- Verify `photos_url.csv` is present
- Check available disk space
- Reduce `MAX_WORKERS` if hitting rate limits
3. **Text search not working**
- Add `HF_TOKEN` as a Space secret
- Ensure the CLIP model endpoint is reachable
### Performance Tips
- **Faster Downloads**: Increase `max_workers` in download functions
- **Memory Usage**: Reduce `MAX_DISPLAY_IMAGES` for lower memory usage
- **Image Quality**: Adjust JPEG quality in `download_images.py`
## πŸ“ License
This project is open source. Feel free to modify and distribute.
## 🀝 Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Submit a pull request
## πŸ“ž Support
If you encounter issues:
1. Check the troubleshooting section above
2. Review the console output for error messages
3. Ensure all required files are present
4. Verify Python version compatibility
---
**Built with ❀️ using Streamlit and Python**