Spaces:

manisharma494
/

Virtual-Search-System

Sleeping

App Files Files Community

Virtual-Search-System / README.md

manisharma494

Update README.md

fdd82d2 verified 23 days ago

preview code

raw

history blame contribute delete

5.07 kB

	---
	title: Visual Search System
	emoji: 🔍
	colorFrom: blue
	colorTo: green
	sdk: streamlit
	sdk_version: "1.37.0"
	app_file: app.py
	pinned: false
	license: mit
	---

	# 🔍 Visual Search System

	A Streamlit app that downloads images from `photos_url.csv`, builds lightweight visual embeddings, and lets you search by text (optional, via Hugging Face Inference API) or by uploading an image.

	## ✨ Features

	- 📥 Automatic downloads: Pulls images from `photos_url.csv` with retries and optimization
	- 🧠 Embeddings: Creates simple, robust RGB histogram embeddings locally (no GPU needed)
	- 🔤 Text search (optional): Uses `openai/clip-vit-base-patch32` via HF Inference API when `HF_TOKEN` is provided
	- 📁 Image similarity search: Upload an image and find visually similar images
	- 📱 Modern UI: Streamlit interface with responsive layout and status tracking

	## 🚀 Quick Start

	### Local Development

	1. Clone the repository:
	```bash
	git clone <your-repo-url>
	cd visual-search-system
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Run the app:
	```bash
	streamlit run app.py
	```

	### Hugging Face Spaces Deployment

	1. Create a new Space and select the SDK: `Streamlit`.
	2. Ensure this repository contains at least: `app.py`, `photos_url.csv`, `requirements.txt`, `README.md`.
	3. Optional: Set a Space secret named `HF_TOKEN` if you want text search enabled.
	- In your Space, go to Settings → Secrets → Add `HF_TOKEN` (a valid Hugging Face token).
	4. Push/Upload files. The build will install `requirements.txt` and start `app.py` automatically.

	Notes:
	- You do NOT need a Dockerfile for Streamlit Spaces (the metadata header in this README is sufficient).
	- Without `HF_TOKEN`, the app still works with image upload search; text search will be disabled with a warning.

	## 📁 Project Structure

	```
	visual-search-system/
	├── app.py # Main Streamlit application (entry point)
	├── download_images.py # Optional: standalone downloader utility
	├── photos_url.csv # Dataset with image URLs
	├── requirements.txt # Python dependencies
	├── README.md # This file (contains HF Spaces metadata)
	└── images/ # Downloaded images (created automatically)
	```

	## 🎯 How It Works

	1. On first run, the app reads `photos_url.csv` and downloads up to 250 images (configurable).
	2. It creates local visual embeddings using RGB histograms and saves them to `embeddings/`.
	3. In the UI you can:
	- Perform text search (requires `HF_TOKEN`) against `openai/clip-vit-base-patch32` via Inference API.
	- Upload an image to find visually similar images using cosine similarity over local embeddings.

	## 📊 Dataset Information

	This repository expects a `photos_url.csv` with at least one column containing HTTP/HTTPS image URLs.
	Images are stored as JPEG, optimized to ~800×800 pixels to balance quality and performance.

	## 🛠️ Technical Details

	### Dependencies
	- `streamlit` - web interface
	- `pandas` - CSV handling
	- `requests` - HTTP downloads
	- `pillow` - image processing
	- `numpy` - embeddings and similarity
	- `tqdm` - used by `download_images.py` (optional utility)

	### Performance Features
	- Parallel downloads with retries and exponential backoff
	- Atomic writes for embedding/index files to avoid corruption
	- Progress persisted to `progress.json` for resilience

	## 🔧 Configuration

	### Environment Variables
	- `HF_TOKEN` (optional): Hugging Face token to enable text search via Inference API.

	### Customization (in `app.py`)
	- `MAX_IMAGES`: number of images to process (default 250)
	- `MAX_WORKERS`: parallel download workers (default 6)
	- `TARGET_MAX_SIZE`: image resize target (default 800×800)

	## 🚨 Troubleshooting

	### Common Issues

	1. Space fails to start (HF Spaces)
	- Ensure the SDK in the Space is set to Streamlit and this README has the metadata block
	- Confirm `app.py` and `requirements.txt` exist at the repo root

	2. Image download failures
	- Check internet connection
	- Verify `photos_url.csv` is present
	- Check available disk space
	- Reduce `MAX_WORKERS` if hitting rate limits

	3. Text search not working
	- Add `HF_TOKEN` as a Space secret
	- Ensure the CLIP model endpoint is reachable

	### Performance Tips

	- Faster Downloads: Increase `max_workers` in download functions
	- Memory Usage: Reduce `MAX_DISPLAY_IMAGES` for lower memory usage
	- Image Quality: Adjust JPEG quality in `download_images.py`

	## 📝 License

	This project is open source. Feel free to modify and distribute.

	## 🤝 Contributing

	1. Fork the repository
	2. Create a feature branch
	3. Make your changes
	4. Submit a pull request

	## 📞 Support

	If you encounter issues:
	1. Check the troubleshooting section above
	2. Review the console output for error messages
	3. Ensure all required files are present
	4. Verify Python version compatibility

	---

	Built with ❤️ using Streamlit and Python