File size: 4,609 Bytes
795cdcd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
title: Visual Search System
emoji: πŸ”
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: "1.37.0"
app_file: app.py
pinned: false
license: mit
---

# πŸ” Visual Search System

A comprehensive Streamlit application for browsing and searching through a large dataset of high-quality images from Unsplash.

## ✨ Features

- **πŸ”Ž Search by ID**: Find specific images by their ID number
- **πŸ“¦ Browse by Block**: Navigate through images in organized blocks of 100
- **πŸ“₯ Automatic Downloads**: Automatically downloads missing images with parallel processing
- **πŸš€ Smart Dependencies**: Auto-installs required packages
- **πŸ“± Responsive UI**: Clean, modern interface optimized for all devices

## πŸš€ Quick Start

### Local Development

1. **Clone the repository:**
   ```bash
   git clone <your-repo-url>
   cd visual-search-system
   ```

2. **Install dependencies:**
   ```bash
   pip install -r requirements.txt
   ```

3. **Run the app:**
   ```bash
   streamlit run app.py
   ```

### Hugging Face Spaces Deployment

1. **Create a new Space** on Hugging Face
2. **Choose Streamlit** as the SDK
3. **Upload these files:**
   - `app.py` (main application)
   - `download_images.py` (image downloading logic)
   - `photos_url.csv` (image dataset)
   - `requirements.txt` (dependencies)
   - `README.md` (this file)

The app will automatically:
- Install dependencies
- Check for downloaded images
- Download missing images if needed
- Launch the Streamlit interface

## πŸ“ Project Structure

```
visual-search-system/
β”œβ”€β”€ app.py                 # Main Streamlit application
β”œβ”€β”€ download_images.py     # Image downloading utilities
β”œβ”€β”€ photos_url.csv        # Dataset with 25,000+ image URLs
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ README.md            # This file
└── images/              # Downloaded images (created automatically)
```

## 🎯 How It Works

### Search by ID
- Enter a specific image ID (e.g., "0001", "1234")
- Leave empty to browse the first 500 images
- Results update in real-time

### Range by Block
- Each block contains 100 images
- Enter a number between 1-250
- Example: Block 100 shows images 10001-10100

### Image Management
- Automatically detects existing images
- Downloads missing images in parallel (20 workers)
- Optimizes images to 800x800 pixels
- Saves as compressed JPEGs

## πŸ“Š Dataset Information

- **Total Images**: 25,000+
- **Source**: Unsplash (high-quality stock photos)
- **Format**: JPEG, optimized for web
- **Size**: Approximately 1.5GB total
- **Resolution**: 800x800 pixels (maintains aspect ratio)

## πŸ› οΈ Technical Details

### Dependencies
- `streamlit` - Web interface framework
- `pandas` - Data manipulation
- `requests` - HTTP requests for image downloads
- `pillow` - Image processing
- `tqdm` - Progress bars

### Performance Features
- **Parallel Downloads**: Uses ThreadPoolExecutor for speed
- **Retry Logic**: Handles failed downloads gracefully
- **Smart Caching**: Skips already downloaded images
- **Memory Efficient**: Processes images in chunks

## πŸ”§ Configuration

### Environment Variables
- No environment variables required
- All configuration is built-in

### Customization
- Modify `MAX_DISPLAY_IMAGES` in `app.py` to change display limit
- Adjust `max_workers` in download functions for different performance
- Change `target_size` for different image resolutions

## 🚨 Troubleshooting

### Common Issues

1. **"No application file found" on Hugging Face**
   - Ensure `app.py` is the main file (not `start_app.py`)
   - Check that `requirements.txt` is present
   - Verify Streamlit SDK is selected

2. **Image download failures**
   - Check internet connection
   - Verify `photos_url.csv` is present
   - Check available disk space

3. **Dependency issues**
   - Ensure Python 3.8+ is used
   - Try updating pip: `pip install --upgrade pip`

### Performance Tips

- **Faster Downloads**: Increase `max_workers` in download functions
- **Memory Usage**: Reduce `MAX_DISPLAY_IMAGES` for lower memory usage
- **Image Quality**: Adjust JPEG quality in `download_images.py`

## πŸ“ License

This project is open source. Feel free to modify and distribute.

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Submit a pull request

## πŸ“ž Support

If you encounter issues:
1. Check the troubleshooting section above
2. Review the console output for error messages
3. Ensure all required files are present
4. Verify Python version compatibility

---

**Built with ❀️ using Streamlit and Python**