--- title: Visual Search System emoji: 🔍 colorFrom: blue colorTo: green sdk: streamlit sdk_version: "1.37.0" app_file: app.py pinned: false license: mit --- # 🔍 Visual Search System A Streamlit app that downloads images from `photos_url.csv`, builds lightweight visual embeddings, and lets you search by text (optional, via Hugging Face Inference API) or by uploading an image. ## ✨ Features - **📥 Automatic downloads**: Pulls images from `photos_url.csv` with retries and optimization - **🧠 Embeddings**: Creates simple, robust RGB histogram embeddings locally (no GPU needed) - **🔤 Text search (optional)**: Uses `openai/clip-vit-base-patch32` via HF Inference API when `HF_TOKEN` is provided - **📁 Image similarity search**: Upload an image and find visually similar images - **📱 Modern UI**: Streamlit interface with responsive layout and status tracking ## 🚀 Quick Start ### Local Development 1. **Clone the repository:** ```bash git clone cd visual-search-system ``` 2. **Install dependencies:** ```bash pip install -r requirements.txt ``` 3. **Run the app:** ```bash streamlit run app.py ``` ### Hugging Face Spaces Deployment 1. Create a new Space and select the SDK: `Streamlit`. 2. Ensure this repository contains at least: `app.py`, `photos_url.csv`, `requirements.txt`, `README.md`. 3. Optional: Set a Space secret named `HF_TOKEN` if you want text search enabled. - In your Space, go to Settings → Secrets → Add `HF_TOKEN` (a valid Hugging Face token). 4. Push/Upload files. The build will install `requirements.txt` and start `app.py` automatically. Notes: - You do NOT need a Dockerfile for Streamlit Spaces (the metadata header in this README is sufficient). - Without `HF_TOKEN`, the app still works with image upload search; text search will be disabled with a warning. ## 📁 Project Structure ``` visual-search-system/ ├── app.py # Main Streamlit application (entry point) ├── download_images.py # Optional: standalone downloader utility ├── photos_url.csv # Dataset with image URLs ├── requirements.txt # Python dependencies ├── README.md # This file (contains HF Spaces metadata) └── images/ # Downloaded images (created automatically) ``` ## 🎯 How It Works 1. On first run, the app reads `photos_url.csv` and downloads up to 250 images (configurable). 2. It creates local visual embeddings using RGB histograms and saves them to `embeddings/`. 3. In the UI you can: - Perform text search (requires `HF_TOKEN`) against `openai/clip-vit-base-patch32` via Inference API. - Upload an image to find visually similar images using cosine similarity over local embeddings. ## 📊 Dataset Information This repository expects a `photos_url.csv` with at least one column containing HTTP/HTTPS image URLs. Images are stored as JPEG, optimized to ~800×800 pixels to balance quality and performance. ## 🛠️ Technical Details ### Dependencies - `streamlit` - web interface - `pandas` - CSV handling - `requests` - HTTP downloads - `pillow` - image processing - `numpy` - embeddings and similarity - `tqdm` - used by `download_images.py` (optional utility) ### Performance Features - Parallel downloads with retries and exponential backoff - Atomic writes for embedding/index files to avoid corruption - Progress persisted to `progress.json` for resilience ## 🔧 Configuration ### Environment Variables - `HF_TOKEN` (optional): Hugging Face token to enable text search via Inference API. ### Customization (in `app.py`) - `MAX_IMAGES`: number of images to process (default 250) - `MAX_WORKERS`: parallel download workers (default 6) - `TARGET_MAX_SIZE`: image resize target (default 800×800) ## 🚨 Troubleshooting ### Common Issues 1. **Space fails to start (HF Spaces)** - Ensure the SDK in the Space is set to Streamlit and this README has the metadata block - Confirm `app.py` and `requirements.txt` exist at the repo root 2. **Image download failures** - Check internet connection - Verify `photos_url.csv` is present - Check available disk space - Reduce `MAX_WORKERS` if hitting rate limits 3. **Text search not working** - Add `HF_TOKEN` as a Space secret - Ensure the CLIP model endpoint is reachable ### Performance Tips - **Faster Downloads**: Increase `max_workers` in download functions - **Memory Usage**: Reduce `MAX_DISPLAY_IMAGES` for lower memory usage - **Image Quality**: Adjust JPEG quality in `download_images.py` ## 📝 License This project is open source. Feel free to modify and distribute. ## 🤝 Contributing 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Submit a pull request ## 📞 Support If you encounter issues: 1. Check the troubleshooting section above 2. Review the console output for error messages 3. Ensure all required files are present 4. Verify Python version compatibility --- **Built with ❤️ using Streamlit and Python**