Spaces:
Sleeping
Sleeping
File size: 5,066 Bytes
795cdcd fdd82d2 795cdcd fdd82d2 795cdcd fdd82d2 795cdcd fdd82d2 795cdcd fdd82d2 795cdcd fdd82d2 795cdcd fdd82d2 795cdcd fdd82d2 795cdcd fdd82d2 795cdcd fdd82d2 795cdcd fdd82d2 795cdcd fdd82d2 795cdcd fdd82d2 795cdcd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
---
title: Visual Search System
emoji: π
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: "1.37.0"
app_file: app.py
pinned: false
license: mit
---
# π Visual Search System
A Streamlit app that downloads images from `photos_url.csv`, builds lightweight visual embeddings, and lets you search by text (optional, via Hugging Face Inference API) or by uploading an image.
## β¨ Features
- **π₯ Automatic downloads**: Pulls images from `photos_url.csv` with retries and optimization
- **π§ Embeddings**: Creates simple, robust RGB histogram embeddings locally (no GPU needed)
- **π€ Text search (optional)**: Uses `openai/clip-vit-base-patch32` via HF Inference API when `HF_TOKEN` is provided
- **π Image similarity search**: Upload an image and find visually similar images
- **π± Modern UI**: Streamlit interface with responsive layout and status tracking
## π Quick Start
### Local Development
1. **Clone the repository:**
```bash
git clone <your-repo-url>
cd visual-search-system
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **Run the app:**
```bash
streamlit run app.py
```
### Hugging Face Spaces Deployment
1. Create a new Space and select the SDK: `Streamlit`.
2. Ensure this repository contains at least: `app.py`, `photos_url.csv`, `requirements.txt`, `README.md`.
3. Optional: Set a Space secret named `HF_TOKEN` if you want text search enabled.
- In your Space, go to Settings β Secrets β Add `HF_TOKEN` (a valid Hugging Face token).
4. Push/Upload files. The build will install `requirements.txt` and start `app.py` automatically.
Notes:
- You do NOT need a Dockerfile for Streamlit Spaces (the metadata header in this README is sufficient).
- Without `HF_TOKEN`, the app still works with image upload search; text search will be disabled with a warning.
## π Project Structure
```
visual-search-system/
βββ app.py # Main Streamlit application (entry point)
βββ download_images.py # Optional: standalone downloader utility
βββ photos_url.csv # Dataset with image URLs
βββ requirements.txt # Python dependencies
βββ README.md # This file (contains HF Spaces metadata)
βββ images/ # Downloaded images (created automatically)
```
## π― How It Works
1. On first run, the app reads `photos_url.csv` and downloads up to 250 images (configurable).
2. It creates local visual embeddings using RGB histograms and saves them to `embeddings/`.
3. In the UI you can:
- Perform text search (requires `HF_TOKEN`) against `openai/clip-vit-base-patch32` via Inference API.
- Upload an image to find visually similar images using cosine similarity over local embeddings.
## π Dataset Information
This repository expects a `photos_url.csv` with at least one column containing HTTP/HTTPS image URLs.
Images are stored as JPEG, optimized to ~800Γ800 pixels to balance quality and performance.
## π οΈ Technical Details
### Dependencies
- `streamlit` - web interface
- `pandas` - CSV handling
- `requests` - HTTP downloads
- `pillow` - image processing
- `numpy` - embeddings and similarity
- `tqdm` - used by `download_images.py` (optional utility)
### Performance Features
- Parallel downloads with retries and exponential backoff
- Atomic writes for embedding/index files to avoid corruption
- Progress persisted to `progress.json` for resilience
## π§ Configuration
### Environment Variables
- `HF_TOKEN` (optional): Hugging Face token to enable text search via Inference API.
### Customization (in `app.py`)
- `MAX_IMAGES`: number of images to process (default 250)
- `MAX_WORKERS`: parallel download workers (default 6)
- `TARGET_MAX_SIZE`: image resize target (default 800Γ800)
## π¨ Troubleshooting
### Common Issues
1. **Space fails to start (HF Spaces)**
- Ensure the SDK in the Space is set to Streamlit and this README has the metadata block
- Confirm `app.py` and `requirements.txt` exist at the repo root
2. **Image download failures**
- Check internet connection
- Verify `photos_url.csv` is present
- Check available disk space
- Reduce `MAX_WORKERS` if hitting rate limits
3. **Text search not working**
- Add `HF_TOKEN` as a Space secret
- Ensure the CLIP model endpoint is reachable
### Performance Tips
- **Faster Downloads**: Increase `max_workers` in download functions
- **Memory Usage**: Reduce `MAX_DISPLAY_IMAGES` for lower memory usage
- **Image Quality**: Adjust JPEG quality in `download_images.py`
## π License
This project is open source. Feel free to modify and distribute.
## π€ Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Submit a pull request
## π Support
If you encounter issues:
1. Check the troubleshooting section above
2. Review the console output for error messages
3. Ensure all required files are present
4. Verify Python version compatibility
---
**Built with β€οΈ using Streamlit and Python** |