Spaces:

manisharma494
/

Virtual-Search-System

Sleeping

App Files Files Community

Virtual-Search-System / README.md

manisharma494

Update README.md

fdd82d2 verified 21 days ago

preview code

raw

history blame contribute delete

5.07 kB

A newer version of the Streamlit SDK is available: 1.50.0

Upgrade

metadata

title: Visual Search System
emoji: 🔍
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.37.0
app_file: app.py
pinned: false
license: mit

🔍 Visual Search System

A Streamlit app that downloads images from photos_url.csv, builds lightweight visual embeddings, and lets you search by text (optional, via Hugging Face Inference API) or by uploading an image.

✨ Features

📥 Automatic downloads: Pulls images from photos_url.csv with retries and optimization
🧠 Embeddings: Creates simple, robust RGB histogram embeddings locally (no GPU needed)
🔤 Text search (optional): Uses openai/clip-vit-base-patch32 via HF Inference API when HF_TOKEN is provided
📁 Image similarity search: Upload an image and find visually similar images
📱 Modern UI: Streamlit interface with responsive layout and status tracking

🚀 Quick Start

Local Development

Clone the repository:

git clone <your-repo-url>
cd visual-search-system

Install dependencies:
```
pip install -r requirements.txt
```
Run the app:
```
streamlit run app.py
```

Hugging Face Spaces Deployment

Create a new Space and select the SDK: Streamlit.
Ensure this repository contains at least: app.py, photos_url.csv, requirements.txt, README.md.
Optional: Set a Space secret named HF_TOKEN if you want text search enabled.
- In your Space, go to Settings → Secrets → Add HF_TOKEN (a valid Hugging Face token).
Push/Upload files. The build will install requirements.txt and start app.py automatically.

Notes:

You do NOT need a Dockerfile for Streamlit Spaces (the metadata header in this README is sufficient).
Without HF_TOKEN, the app still works with image upload search; text search will be disabled with a warning.

📁 Project Structure

visual-search-system/
├── app.py                 # Main Streamlit application (entry point)
├── download_images.py     # Optional: standalone downloader utility
├── photos_url.csv         # Dataset with image URLs
├── requirements.txt       # Python dependencies
├── README.md              # This file (contains HF Spaces metadata)
└── images/                # Downloaded images (created automatically)

🎯 How It Works

On first run, the app reads photos_url.csv and downloads up to 250 images (configurable).
It creates local visual embeddings using RGB histograms and saves them to embeddings/.
In the UI you can:
- Perform text search (requires HF_TOKEN) against openai/clip-vit-base-patch32 via Inference API.
- Upload an image to find visually similar images using cosine similarity over local embeddings.

📊 Dataset Information

This repository expects a photos_url.csv with at least one column containing HTTP/HTTPS image URLs. Images are stored as JPEG, optimized to ~800×800 pixels to balance quality and performance.

🛠️ Technical Details

Dependencies

streamlit - web interface
pandas - CSV handling
requests - HTTP downloads
pillow - image processing
numpy - embeddings and similarity
tqdm - used by download_images.py (optional utility)

Performance Features

Parallel downloads with retries and exponential backoff
Atomic writes for embedding/index files to avoid corruption
Progress persisted to progress.json for resilience

🔧 Configuration

Environment Variables

HF_TOKEN (optional): Hugging Face token to enable text search via Inference API.

Customization (in `app.py`)

MAX_IMAGES: number of images to process (default 250)
MAX_WORKERS: parallel download workers (default 6)
TARGET_MAX_SIZE: image resize target (default 800×800)

🚨 Troubleshooting

Common Issues

Space fails to start (HF Spaces)
- Ensure the SDK in the Space is set to Streamlit and this README has the metadata block
- Confirm app.py and requirements.txt exist at the repo root
Image download failures
- Check internet connection
- Verify photos_url.csv is present
- Check available disk space
- Reduce MAX_WORKERS if hitting rate limits
Text search not working
- Add HF_TOKEN as a Space secret
- Ensure the CLIP model endpoint is reachable

Performance Tips

Faster Downloads: Increase max_workers in download functions
Memory Usage: Reduce MAX_DISPLAY_IMAGES for lower memory usage
Image Quality: Adjust JPEG quality in download_images.py

📝 License

This project is open source. Feel free to modify and distribute.

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

📞 Support

If you encounter issues:

Check the troubleshooting section above
Review the console output for error messages
Ensure all required files are present
Verify Python version compatibility

Built with ❤️ using Streamlit and Python