Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -12,15 +12,15 @@ license: mit
|
|
| 12 |
|
| 13 |
# π Visual Search System
|
| 14 |
|
| 15 |
-
A
|
| 16 |
|
| 17 |
## β¨ Features
|
| 18 |
|
| 19 |
-
-
|
| 20 |
-
-
|
| 21 |
-
-
|
| 22 |
-
-
|
| 23 |
-
- **π±
|
| 24 |
|
| 25 |
## π Quick Start
|
| 26 |
|
|
@@ -44,102 +44,83 @@ A comprehensive Streamlit application for browsing and searching through a large
|
|
| 44 |
|
| 45 |
### Hugging Face Spaces Deployment
|
| 46 |
|
| 47 |
-
1.
|
| 48 |
-
2.
|
| 49 |
-
3.
|
| 50 |
-
- `
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
The app will automatically:
|
| 57 |
-
- Install dependencies
|
| 58 |
-
- Check for downloaded images
|
| 59 |
-
- Download missing images if needed
|
| 60 |
-
- Launch the Streamlit interface
|
| 61 |
|
| 62 |
## π Project Structure
|
| 63 |
|
| 64 |
```
|
| 65 |
visual-search-system/
|
| 66 |
-
βββ app.py # Main Streamlit application
|
| 67 |
-
βββ download_images.py #
|
| 68 |
-
βββ photos_url.csv
|
| 69 |
-
βββ requirements.txt
|
| 70 |
-
βββ README.md
|
| 71 |
-
βββ images/
|
| 72 |
```
|
| 73 |
|
| 74 |
## π― How It Works
|
| 75 |
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
-
|
| 80 |
-
|
| 81 |
-
### Range by Block
|
| 82 |
-
- Each block contains 100 images
|
| 83 |
-
- Enter a number between 1-250
|
| 84 |
-
- Example: Block 100 shows images 10001-10100
|
| 85 |
-
|
| 86 |
-
### Image Management
|
| 87 |
-
- Automatically detects existing images
|
| 88 |
-
- Downloads missing images in parallel (20 workers)
|
| 89 |
-
- Optimizes images to 800x800 pixels
|
| 90 |
-
- Saves as compressed JPEGs
|
| 91 |
|
| 92 |
## π Dataset Information
|
| 93 |
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
- **Format**: JPEG, optimized for web
|
| 97 |
-
- **Size**: Approximately 1.5GB total
|
| 98 |
-
- **Resolution**: 800x800 pixels (maintains aspect ratio)
|
| 99 |
|
| 100 |
## π οΈ Technical Details
|
| 101 |
|
| 102 |
### Dependencies
|
| 103 |
-
- `streamlit` -
|
| 104 |
-
- `pandas` -
|
| 105 |
-
- `requests` - HTTP
|
| 106 |
-
- `pillow` -
|
| 107 |
-
- `
|
|
|
|
| 108 |
|
| 109 |
### Performance Features
|
| 110 |
-
-
|
| 111 |
-
-
|
| 112 |
-
-
|
| 113 |
-
- **Memory Efficient**: Processes images in chunks
|
| 114 |
|
| 115 |
## π§ Configuration
|
| 116 |
|
| 117 |
### Environment Variables
|
| 118 |
-
-
|
| 119 |
-
- All configuration is built-in
|
| 120 |
|
| 121 |
-
### Customization
|
| 122 |
-
-
|
| 123 |
-
-
|
| 124 |
-
-
|
| 125 |
|
| 126 |
## π¨ Troubleshooting
|
| 127 |
|
| 128 |
### Common Issues
|
| 129 |
|
| 130 |
-
1. **
|
| 131 |
-
- Ensure
|
| 132 |
-
-
|
| 133 |
-
- Verify Streamlit SDK is selected
|
| 134 |
|
| 135 |
2. **Image download failures**
|
| 136 |
- Check internet connection
|
| 137 |
- Verify `photos_url.csv` is present
|
| 138 |
- Check available disk space
|
|
|
|
| 139 |
|
| 140 |
-
3. **
|
| 141 |
-
-
|
| 142 |
-
-
|
| 143 |
|
| 144 |
### Performance Tips
|
| 145 |
|
|
|
|
| 12 |
|
| 13 |
# π Visual Search System
|
| 14 |
|
| 15 |
+
A Streamlit app that downloads images from `photos_url.csv`, builds lightweight visual embeddings, and lets you search by text (optional, via Hugging Face Inference API) or by uploading an image.
|
| 16 |
|
| 17 |
## β¨ Features
|
| 18 |
|
| 19 |
+
- **π₯ Automatic downloads**: Pulls images from `photos_url.csv` with retries and optimization
|
| 20 |
+
- **π§ Embeddings**: Creates simple, robust RGB histogram embeddings locally (no GPU needed)
|
| 21 |
+
- **π€ Text search (optional)**: Uses `openai/clip-vit-base-patch32` via HF Inference API when `HF_TOKEN` is provided
|
| 22 |
+
- **π Image similarity search**: Upload an image and find visually similar images
|
| 23 |
+
- **π± Modern UI**: Streamlit interface with responsive layout and status tracking
|
| 24 |
|
| 25 |
## π Quick Start
|
| 26 |
|
|
|
|
| 44 |
|
| 45 |
### Hugging Face Spaces Deployment
|
| 46 |
|
| 47 |
+
1. Create a new Space and select the SDK: `Streamlit`.
|
| 48 |
+
2. Ensure this repository contains at least: `app.py`, `photos_url.csv`, `requirements.txt`, `README.md`.
|
| 49 |
+
3. Optional: Set a Space secret named `HF_TOKEN` if you want text search enabled.
|
| 50 |
+
- In your Space, go to Settings β Secrets β Add `HF_TOKEN` (a valid Hugging Face token).
|
| 51 |
+
4. Push/Upload files. The build will install `requirements.txt` and start `app.py` automatically.
|
| 52 |
+
|
| 53 |
+
Notes:
|
| 54 |
+
- You do NOT need a Dockerfile for Streamlit Spaces (the metadata header in this README is sufficient).
|
| 55 |
+
- Without `HF_TOKEN`, the app still works with image upload search; text search will be disabled with a warning.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
## π Project Structure
|
| 58 |
|
| 59 |
```
|
| 60 |
visual-search-system/
|
| 61 |
+
βββ app.py # Main Streamlit application (entry point)
|
| 62 |
+
βββ download_images.py # Optional: standalone downloader utility
|
| 63 |
+
βββ photos_url.csv # Dataset with image URLs
|
| 64 |
+
βββ requirements.txt # Python dependencies
|
| 65 |
+
βββ README.md # This file (contains HF Spaces metadata)
|
| 66 |
+
βββ images/ # Downloaded images (created automatically)
|
| 67 |
```
|
| 68 |
|
| 69 |
## π― How It Works
|
| 70 |
|
| 71 |
+
1. On first run, the app reads `photos_url.csv` and downloads up to 250 images (configurable).
|
| 72 |
+
2. It creates local visual embeddings using RGB histograms and saves them to `embeddings/`.
|
| 73 |
+
3. In the UI you can:
|
| 74 |
+
- Perform text search (requires `HF_TOKEN`) against `openai/clip-vit-base-patch32` via Inference API.
|
| 75 |
+
- Upload an image to find visually similar images using cosine similarity over local embeddings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
## π Dataset Information
|
| 78 |
|
| 79 |
+
This repository expects a `photos_url.csv` with at least one column containing HTTP/HTTPS image URLs.
|
| 80 |
+
Images are stored as JPEG, optimized to ~800Γ800 pixels to balance quality and performance.
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
## π οΈ Technical Details
|
| 83 |
|
| 84 |
### Dependencies
|
| 85 |
+
- `streamlit` - web interface
|
| 86 |
+
- `pandas` - CSV handling
|
| 87 |
+
- `requests` - HTTP downloads
|
| 88 |
+
- `pillow` - image processing
|
| 89 |
+
- `numpy` - embeddings and similarity
|
| 90 |
+
- `tqdm` - used by `download_images.py` (optional utility)
|
| 91 |
|
| 92 |
### Performance Features
|
| 93 |
+
- Parallel downloads with retries and exponential backoff
|
| 94 |
+
- Atomic writes for embedding/index files to avoid corruption
|
| 95 |
+
- Progress persisted to `progress.json` for resilience
|
|
|
|
| 96 |
|
| 97 |
## π§ Configuration
|
| 98 |
|
| 99 |
### Environment Variables
|
| 100 |
+
- `HF_TOKEN` (optional): Hugging Face token to enable text search via Inference API.
|
|
|
|
| 101 |
|
| 102 |
+
### Customization (in `app.py`)
|
| 103 |
+
- `MAX_IMAGES`: number of images to process (default 250)
|
| 104 |
+
- `MAX_WORKERS`: parallel download workers (default 6)
|
| 105 |
+
- `TARGET_MAX_SIZE`: image resize target (default 800Γ800)
|
| 106 |
|
| 107 |
## π¨ Troubleshooting
|
| 108 |
|
| 109 |
### Common Issues
|
| 110 |
|
| 111 |
+
1. **Space fails to start (HF Spaces)**
|
| 112 |
+
- Ensure the SDK in the Space is set to Streamlit and this README has the metadata block
|
| 113 |
+
- Confirm `app.py` and `requirements.txt` exist at the repo root
|
|
|
|
| 114 |
|
| 115 |
2. **Image download failures**
|
| 116 |
- Check internet connection
|
| 117 |
- Verify `photos_url.csv` is present
|
| 118 |
- Check available disk space
|
| 119 |
+
- Reduce `MAX_WORKERS` if hitting rate limits
|
| 120 |
|
| 121 |
+
3. **Text search not working**
|
| 122 |
+
- Add `HF_TOKEN` as a Space secret
|
| 123 |
+
- Ensure the CLIP model endpoint is reachable
|
| 124 |
|
| 125 |
### Performance Tips
|
| 126 |
|