---
title: Visual Search System
emoji: 🔍
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: "1.37.0"
app_file: app.py
pinned: false
license: mit
---

# 🔍 Visual Search System

A Streamlit app that downloads images from `photos_url.csv`, builds lightweight visual embeddings, and lets you search by text (optional, via Hugging Face Inference API) or by uploading an image.

## ✨ Features

- **📥 Automatic downloads**: Pulls images from `photos_url.csv` with retries and optimization
- **🧠 Embeddings**: Creates simple, robust RGB histogram embeddings locally (no GPU needed)
- **🔤 Text search (optional)**: Uses `openai/clip-vit-base-patch32` via HF Inference API when `HF_TOKEN` is provided
- **📁 Image similarity search**: Upload an image and find visually similar images
- **📱 Modern UI**: Streamlit interface with responsive layout and status tracking

## 🚀 Quick Start

### Local Development

1. **Clone the repository:**
   ```bash
   git clone <your-repo-url>
   cd visual-search-system
   ```

2. **Install dependencies:**
   ```bash
   pip install -r requirements.txt
   ```

3. **Run the app:**
   ```bash
   streamlit run app.py
   ```

### Hugging Face Spaces Deployment

1. Create a new Space and select the SDK: `Streamlit`.
2. Ensure this repository contains at least: `app.py`, `photos_url.csv`, `requirements.txt`, `README.md`.
3. Optional: Set a Space secret named `HF_TOKEN` if you want text search enabled.
   - In your Space, go to Settings → Secrets → Add `HF_TOKEN` (a valid Hugging Face token).
4. Push/Upload files. The build will install `requirements.txt` and start `app.py` automatically.

Notes:
- You do NOT need a Dockerfile for Streamlit Spaces (the metadata header in this README is sufficient).
- Without `HF_TOKEN`, the app still works with image upload search; text search will be disabled with a warning.

## 📁 Project Structure

```
visual-search-system/
├── app.py                 # Main Streamlit application (entry point)
├── download_images.py     # Optional: standalone downloader utility
├── photos_url.csv         # Dataset with image URLs
├── requirements.txt       # Python dependencies
├── README.md              # This file (contains HF Spaces metadata)
└── images/                # Downloaded images (created automatically)
```

## 🎯 How It Works

1. On first run, the app reads `photos_url.csv` and downloads up to 250 images (configurable).
2. It creates local visual embeddings using RGB histograms and saves them to `embeddings/`.
3. In the UI you can:
   - Perform text search (requires `HF_TOKEN`) against `openai/clip-vit-base-patch32` via Inference API.
   - Upload an image to find visually similar images using cosine similarity over local embeddings.

## 📊 Dataset Information

This repository expects a `photos_url.csv` with at least one column containing HTTP/HTTPS image URLs.
Images are stored as JPEG, optimized to ~800×800 pixels to balance quality and performance.

## 🛠️ Technical Details

### Dependencies
- `streamlit` - web interface
- `pandas` - CSV handling
- `requests` - HTTP downloads
- `pillow` - image processing
- `numpy` - embeddings and similarity
- `tqdm` - used by `download_images.py` (optional utility)

### Performance Features
- Parallel downloads with retries and exponential backoff
- Atomic writes for embedding/index files to avoid corruption
- Progress persisted to `progress.json` for resilience

## 🔧 Configuration

### Environment Variables
- `HF_TOKEN` (optional): Hugging Face token to enable text search via Inference API.

### Customization (in `app.py`)
- `MAX_IMAGES`: number of images to process (default 250)
- `MAX_WORKERS`: parallel download workers (default 6)
- `TARGET_MAX_SIZE`: image resize target (default 800×800)

## 🚨 Troubleshooting

### Common Issues

1. **Space fails to start (HF Spaces)**
   - Ensure the SDK in the Space is set to Streamlit and this README has the metadata block
   - Confirm `app.py` and `requirements.txt` exist at the repo root

2. **Image download failures**
   - Check internet connection
   - Verify `photos_url.csv` is present
   - Check available disk space
   - Reduce `MAX_WORKERS` if hitting rate limits

3. **Text search not working**
   - Add `HF_TOKEN` as a Space secret
   - Ensure the CLIP model endpoint is reachable

### Performance Tips

- **Faster Downloads**: Increase `max_workers` in download functions
- **Memory Usage**: Reduce `MAX_DISPLAY_IMAGES` for lower memory usage
- **Image Quality**: Adjust JPEG quality in `download_images.py`

## 📝 License

This project is open source. Feel free to modify and distribute.

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Submit a pull request

## 📞 Support

If you encounter issues:
1. Check the troubleshooting section above
2. Review the console output for error messages
3. Ensure all required files are present
4. Verify Python version compatibility

---

**Built with ❤️ using Streamlit and Python**