manisharma494's picture
Upload 4 files
795cdcd verified
|
raw
history blame
4.61 kB
metadata
title: Visual Search System
emoji: πŸ”
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.37.0
app_file: app.py
pinned: false
license: mit

πŸ” Visual Search System

A comprehensive Streamlit application for browsing and searching through a large dataset of high-quality images from Unsplash.

✨ Features

  • πŸ”Ž Search by ID: Find specific images by their ID number
  • πŸ“¦ Browse by Block: Navigate through images in organized blocks of 100
  • πŸ“₯ Automatic Downloads: Automatically downloads missing images with parallel processing
  • πŸš€ Smart Dependencies: Auto-installs required packages
  • πŸ“± Responsive UI: Clean, modern interface optimized for all devices

πŸš€ Quick Start

Local Development

  1. Clone the repository:

    git clone <your-repo-url>
    cd visual-search-system
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Run the app:

    streamlit run app.py
    

Hugging Face Spaces Deployment

  1. Create a new Space on Hugging Face
  2. Choose Streamlit as the SDK
  3. Upload these files:
    • app.py (main application)
    • download_images.py (image downloading logic)
    • photos_url.csv (image dataset)
    • requirements.txt (dependencies)
    • README.md (this file)

The app will automatically:

  • Install dependencies
  • Check for downloaded images
  • Download missing images if needed
  • Launch the Streamlit interface

πŸ“ Project Structure

visual-search-system/
β”œβ”€β”€ app.py                 # Main Streamlit application
β”œβ”€β”€ download_images.py     # Image downloading utilities
β”œβ”€β”€ photos_url.csv        # Dataset with 25,000+ image URLs
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ README.md            # This file
└── images/              # Downloaded images (created automatically)

🎯 How It Works

Search by ID

  • Enter a specific image ID (e.g., "0001", "1234")
  • Leave empty to browse the first 500 images
  • Results update in real-time

Range by Block

  • Each block contains 100 images
  • Enter a number between 1-250
  • Example: Block 100 shows images 10001-10100

Image Management

  • Automatically detects existing images
  • Downloads missing images in parallel (20 workers)
  • Optimizes images to 800x800 pixels
  • Saves as compressed JPEGs

πŸ“Š Dataset Information

  • Total Images: 25,000+
  • Source: Unsplash (high-quality stock photos)
  • Format: JPEG, optimized for web
  • Size: Approximately 1.5GB total
  • Resolution: 800x800 pixels (maintains aspect ratio)

πŸ› οΈ Technical Details

Dependencies

  • streamlit - Web interface framework
  • pandas - Data manipulation
  • requests - HTTP requests for image downloads
  • pillow - Image processing
  • tqdm - Progress bars

Performance Features

  • Parallel Downloads: Uses ThreadPoolExecutor for speed
  • Retry Logic: Handles failed downloads gracefully
  • Smart Caching: Skips already downloaded images
  • Memory Efficient: Processes images in chunks

πŸ”§ Configuration

Environment Variables

  • No environment variables required
  • All configuration is built-in

Customization

  • Modify MAX_DISPLAY_IMAGES in app.py to change display limit
  • Adjust max_workers in download functions for different performance
  • Change target_size for different image resolutions

🚨 Troubleshooting

Common Issues

  1. "No application file found" on Hugging Face

    • Ensure app.py is the main file (not start_app.py)
    • Check that requirements.txt is present
    • Verify Streamlit SDK is selected
  2. Image download failures

    • Check internet connection
    • Verify photos_url.csv is present
    • Check available disk space
  3. Dependency issues

    • Ensure Python 3.8+ is used
    • Try updating pip: pip install --upgrade pip

Performance Tips

  • Faster Downloads: Increase max_workers in download functions
  • Memory Usage: Reduce MAX_DISPLAY_IMAGES for lower memory usage
  • Image Quality: Adjust JPEG quality in download_images.py

πŸ“ License

This project is open source. Feel free to modify and distribute.

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

πŸ“ž Support

If you encounter issues:

  1. Check the troubleshooting section above
  2. Review the console output for error messages
  3. Ensure all required files are present
  4. Verify Python version compatibility

Built with ❀️ using Streamlit and Python