Spaces:

manisharma494
/

Virtual-Search-System

Sleeping

App Files Files Community

manisharma494 commited on Sep 3

Commit

795cdcd

verified ·

1 Parent(s): 5f3ff4a

Upload 4 files

Browse files

Files changed (4) hide show

DEPLOYMENT.md +99 -0
README.md +171 -0
app.py +402 -0
requirements.txt +5 -0

DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,99 @@

+# 🚀 Hugging Face Spaces Deployment Checklist
+## ✅ Pre-Deployment Checklist
+### 1. File Structure
+- [x] `app.py` - Main Streamlit application (entry point)
+- [x] `download_images.py` - Image downloading utilities
+- [x] `photos_url.csv` - Image dataset (25,000+ URLs)
+- [x] `requirements.txt` - Python dependencies
+- [x] `README.md` - Project documentation
+- [x] `.gitignore` - Clean repository
+### 2. File Names (Critical for Hugging Face)
+- [x] **Main app file**: `app.py` (NOT `start_app.py`)
+- [x] **Dependencies**: `requirements.txt` (lowercase package names)
+- [x] **Documentation**: `README.md`
+### 3. Code Verification
+- [x] App imports successfully
+- [x] Dependencies are correctly specified
+- [x] No syntax errors
+- [x] Proper error handling
+## 🎯 Hugging Face Spaces Setup
+### Step 1: Create New Space
+1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
+2. Click "Create new Space"
+3. Choose your repository
+4. **Select SDK**: `Streamlit` ⭐ (CRITICAL)
+5. **Select License**: Choose appropriate license
+6. Click "Create Space"
+### Step 2: Upload Files
+Upload these files in this exact order:
+1. `app.py` (main application)
+2. `download_images.py` (helper functions)
+3. `photos_url.csv` (dataset)
+4. `requirements.txt` (dependencies)
+5. `README.md` (documentation)
+### Step 3: Verify Deployment
+1. Check that the Space shows "Building" status
+2. Wait for build to complete (usually 2-5 minutes)
+3. Verify the app loads without "No application file found" error
+4. Test the interface functionality
+## 🔧 Troubleshooting Common Issues
+### Issue: "No application file found"
+**Solution**: Ensure `app.py` is the main file (not `start_app.py`)
+### Issue: Build fails
+**Solution**: Check `requirements.txt` has correct package names
+### Issue: App loads but doesn't work
+**Solution**: Check console logs for Python errors
+### Issue: Images not downloading
+**Solution**: Verify `photos_url.csv` is present and accessible
+## 📋 Post-Deployment Verification
+### 1. App Loading
+- [ ] App loads without errors
+- [ ] No "No application file found" message
+- [ ] Streamlit interface appears
+### 2. Functionality
+- [ ] Search by ID works
+- [ ] Range by Block works
+- [ ] Images display correctly
+- [ ] No Python errors in console
+### 3. Performance
+- [ ] App responds within reasonable time
+- [ ] Image downloads work (if needed)
+- [ ] No memory issues
+## 🎉 Success Indicators
+✅ **App loads successfully**
+✅ **No "No application file found" error**
+✅ **Streamlit interface appears**
+✅ **Search functionality works**
+✅ **Images display correctly**
+✅ **No Python errors in logs**
+## 📞 If Issues Persist
+1. **Check Space logs** in Hugging Face interface
+2. **Verify file names** match exactly
+3. **Ensure Streamlit SDK** is selected
+4. **Check requirements.txt** format
+5. **Verify app.py** is the main entry point
+---
+**Your app should now deploy successfully on Hugging Face Spaces! 🚀**

README.md ADDED Viewed

	@@ -0,0 +1,171 @@

+---
+title: Visual Search System
+emoji: 🔍
+colorFrom: blue
+colorTo: green
+sdk: streamlit
+sdk_version: "1.37.0"
+app_file: app.py
+pinned: false
+license: mit
+---
+# 🔍 Visual Search System
+A comprehensive Streamlit application for browsing and searching through a large dataset of high-quality images from Unsplash.
+## ✨ Features
+- **🔎 Search by ID**: Find specific images by their ID number
+- **📦 Browse by Block**: Navigate through images in organized blocks of 100
+- **📥 Automatic Downloads**: Automatically downloads missing images with parallel processing
+- **🚀 Smart Dependencies**: Auto-installs required packages
+- **📱 Responsive UI**: Clean, modern interface optimized for all devices
+## 🚀 Quick Start
+### Local Development
+1. **Clone the repository:**
+   ```bash
+   git clone <your-repo-url>
+   cd visual-search-system
+   ```
+2. **Install dependencies:**
+   ```bash
+   pip install -r requirements.txt
+   ```
+3. **Run the app:**
+   ```bash
+   streamlit run app.py
+   ```
+### Hugging Face Spaces Deployment
+1. **Create a new Space** on Hugging Face
+2. **Choose Streamlit** as the SDK
+3. **Upload these files:**
+   - `app.py` (main application)
+   - `download_images.py` (image downloading logic)
+   - `photos_url.csv` (image dataset)
+   - `requirements.txt` (dependencies)
+   - `README.md` (this file)
+The app will automatically:
+- Install dependencies
+- Check for downloaded images
+- Download missing images if needed
+- Launch the Streamlit interface
+## 📁 Project Structure
+```
+visual-search-system/
+├── app.py                 # Main Streamlit application
+├── download_images.py     # Image downloading utilities
+├── photos_url.csv        # Dataset with 25,000+ image URLs
+├── requirements.txt      # Python dependencies
+├── README.md            # This file
+└── images/              # Downloaded images (created automatically)
+```
+## 🎯 How It Works
+### Search by ID
+- Enter a specific image ID (e.g., "0001", "1234")
+- Leave empty to browse the first 500 images
+- Results update in real-time
+### Range by Block
+- Each block contains 100 images
+- Enter a number between 1-250
+- Example: Block 100 shows images 10001-10100
+### Image Management
+- Automatically detects existing images
+- Downloads missing images in parallel (20 workers)
+- Optimizes images to 800x800 pixels
+- Saves as compressed JPEGs
+## 📊 Dataset Information
+- **Total Images**: 25,000+
+- **Source**: Unsplash (high-quality stock photos)
+- **Format**: JPEG, optimized for web
+- **Size**: Approximately 1.5GB total
+- **Resolution**: 800x800 pixels (maintains aspect ratio)
+## 🛠️ Technical Details
+### Dependencies
+- `streamlit` - Web interface framework
+- `pandas` - Data manipulation
+- `requests` - HTTP requests for image downloads
+- `pillow` - Image processing
+- `tqdm` - Progress bars
+### Performance Features
+- **Parallel Downloads**: Uses ThreadPoolExecutor for speed
+- **Retry Logic**: Handles failed downloads gracefully
+- **Smart Caching**: Skips already downloaded images
+- **Memory Efficient**: Processes images in chunks
+## 🔧 Configuration
+### Environment Variables
+- No environment variables required
+- All configuration is built-in
+### Customization
+- Modify `MAX_DISPLAY_IMAGES` in `app.py` to change display limit
+- Adjust `max_workers` in download functions for different performance
+- Change `target_size` for different image resolutions
+## 🚨 Troubleshooting
+### Common Issues
+1. **"No application file found" on Hugging Face**
+   - Ensure `app.py` is the main file (not `start_app.py`)
+   - Check that `requirements.txt` is present
+   - Verify Streamlit SDK is selected
+2. **Image download failures**
+   - Check internet connection
+   - Verify `photos_url.csv` is present
+   - Check available disk space
+3. **Dependency issues**
+   - Ensure Python 3.8+ is used
+   - Try updating pip: `pip install --upgrade pip`
+### Performance Tips
+- **Faster Downloads**: Increase `max_workers` in download functions
+- **Memory Usage**: Reduce `MAX_DISPLAY_IMAGES` for lower memory usage
+- **Image Quality**: Adjust JPEG quality in `download_images.py`
+## 📝 License
+This project is open source. Feel free to modify and distribute.
+## 🤝 Contributing
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Submit a pull request
+## 📞 Support
+If you encounter issues:
+1. Check the troubleshooting section above
+2. Review the console output for error messages
+3. Ensure all required files are present
+4. Verify Python version compatibility
+---
+**Built with ❤️ using Streamlit and Python**

app.py ADDED Viewed

	@@ -0,0 +1,402 @@

+#!/usr/bin/env python3
+"""
+Visual Search System - Complete Streamlit App
+============================================
+A comprehensive Streamlit application that:
+1. Automatically installs required dependencies
+2. Downloads images from photos_url.csv if needed
+3. Provides a clean UI for searching and viewing images
+4. Supports both search by ID and range by block functionality
+Requirements:
+- photos_url.csv: Contains image URLs
+- download_images.py: Contains parallel downloading logic
+- images/ folder: Will be created and populated with downloaded images
+Usage:
+    streamlit run app.py
+Hugging Face Deployment:
+    This app is configured for Hugging Face Spaces deployment.
+    Upload all files and it will run automatically.
+"""
+import os
+import sys
+import subprocess
+import importlib
+from pathlib import Path
+import pandas as pd
+import streamlit as st
+from typing import List, Tuple, Optional
+import time
+# Configuration
+REQUIRED_PACKAGES = [
+    "streamlit",
+    "pandas",
+    "requests",
+    "PIL",
+    "tqdm"
+]
+IMAGES_DIR = "images"
+CSV_FILE = "photos_url.csv"
+DOWNLOAD_SCRIPT = "download_images.py"
+MAX_DISPLAY_IMAGES = 500
+IMAGES_PER_BLOCK = 100
+TOTAL_BLOCKS = 250
+def install_package(package: str) -> bool:
+    """
+    Install a Python package using pip
+    Args:
+        package: Package name to install
+    Returns:
+        True if successful, False otherwise
+    """
+    try:
+        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
+        return True
+    except subprocess.CalledProcessError:
+        return False
+def check_and_install_dependencies() -> bool:
+    """
+    Check if required packages are installed, install if missing
+    Returns:
+        True if all dependencies are available, False otherwise
+    """
+    print("🔍 Checking dependencies...")
+    missing_packages = []
+    for package in REQUIRED_PACKAGES:
+        try:
+            importlib.import_module(package)
+            print(f"✅ {package} is already installed")
+        except ImportError:
+            print(f"📦 Installing {package}...")
+            missing_packages.append(package)
+    if missing_packages:
+        print(f"🚀 Installing {len(missing_packages)} missing packages...")
+        for package in missing_packages:
+            print(f"📥 Installing {package}...")
+            if install_package(package):
+                print(f"✅ Successfully installed {package}")
+            else:
+                print(f"❌ Failed to install {package}")
+                return False
+        # Verify installations
+        for package in missing_packages:
+            try:
+                importlib.import_module(package)
+                print(f"✅ {package} verified after installation")
+            except ImportError:
+                print(f"❌ {package} still not available after installation")
+                return False
+    print("✅ All dependencies are available!")
+    return True
+def check_images_status() -> Tuple[bool, int, int]:
+    """
+    Check the status of downloaded images
+    Returns:
+        Tuple of (is_complete, current_count, total_count)
+    """
+    images_path = Path(IMAGES_DIR)
+    if not images_path.exists():
+        return False, 0, 0
+    # Count existing images
+    existing_images = list(images_path.glob("*.jpg"))
+    current_count = len(existing_images)
+    # Get total count from CSV
+    try:
+        df = pd.read_csv(CSV_FILE)
+        total_count = len(df)
+    except Exception as e:
+        print(f"❌ Error reading {CSV_FILE}: {e}")
+        return False, current_count, 0
+    is_complete = current_count >= total_count * 0.95  # Consider complete if 95%+ downloaded
+    return is_complete, current_count, total_count
+def download_images_if_needed() -> bool:
+    """
+    Download images if they're missing or incomplete
+    Returns:
+        True if images are available, False otherwise
+    """
+    print("🔍 Checking image status...")
+    is_complete, current_count, total_count = check_images_status()
+    if is_complete:
+        print(f"✅ Images are ready! Have {current_count:,} of {total_count:,} images")
+        return True
+    print(f"📥 Images incomplete: {current_count:,} of {total_count:,} available")
+    print("🚀 Starting image download...")
+    try:
+        # Import download functions from download_images.py
+        sys.path.append('.')
+        from download_images import download_images
+        success = download_images(
+            num_images=None,  # Download all images
+            output_dir=IMAGES_DIR,
+            max_workers=20
+        )
+        if success:
+            print("✅ Image download completed successfully!")
+            return True
+        else:
+            print("⚠️ Image download had some issues, but continuing...")
+            return True
+    except Exception as e:
+        print(f"❌ Error during image download: {e}")
+        return False
+def get_image_path(image_id: str) -> Optional[str]:
+    """
+    Get the file path for a given image ID
+    Args:
+        image_id: Image ID (e.g., "0001", "1234")
+    Returns:
+        File path if exists, None otherwise
+    """
+    try:
+        # Convert image ID to filename format
+        if image_id.isdigit():
+            filename = f"{int(image_id):04d}.jpg"
+        else:
+            filename = f"{image_id}.jpg"
+        image_path = os.path.join(IMAGES_DIR, filename)
+        if os.path.exists(image_path):
+            return image_path
+        else:
+            return None
+    except:
+        return None
+def get_block_images(block_number: int) -> List[str]:
+    """
+    Get all images for a specific block
+    Args:
+        block_number: Block number (1-250)
+    Returns:
+        List of image paths for the block
+    """
+    if not (1 <= block_number <= TOTAL_BLOCKS):
+        return []
+    # Calculate start and end image numbers for this block
+    start_num = (block_number - 1) * IMAGES_PER_BLOCK + 1
+    end_num = block_number * IMAGES_PER_BLOCK
+    image_paths = []
+    for i in range(start_num, end_num + 1):
+        image_path = get_image_path(str(i))
+        if image_path:
+            image_paths.append(image_path)
+    return image_paths
+def search_images_by_id(search_id: str) -> List[str]:
+    """
+    Search for images by ID
+    Args:
+        search_id: Search term (can be partial)
+    Returns:
+        List of matching image paths
+    """
+    if not search_id.strip():
+        # Return first 500 images if no search term
+        return [get_image_path(str(i)) for i in range(1, MAX_DISPLAY_IMAGES + 1)
+                if get_image_path(str(i))]
+    # Search for exact or partial matches
+    matching_paths = []
+    # Try exact match first
+    exact_path = get_image_path(search_id)
+    if exact_path:
+        matching_paths.append(exact_path)
+    # Search for partial matches
+    for i in range(1, 25001):  # Total images in dataset
+        image_path = get_image_path(str(i))
+        if image_path and search_id.lower() in str(i):
+            if image_path not in matching_paths:
+                matching_paths.append(image_path)
+                if len(matching_paths) >= MAX_DISPLAY_IMAGES:
+                    break
+    return matching_paths
+def display_image_grid(image_paths: List[str], title: str):
+    """
+    Display a grid of images using Streamlit
+    Args:
+        image_paths: List of image file paths
+        title: Title for the image grid
+    """
+    if not image_paths:
+        st.warning("No images found matching your criteria.")
+        return
+    st.subheader(f"{title} ({len(image_paths)} images)")
+    # Create columns for the grid (3 columns)
+    cols = st.columns(3)
+    for idx, image_path in enumerate(image_paths):
+        col_idx = idx % 3
+        with cols[col_idx]:
+            try:
+                st.image(image_path, caption=f"Image {os.path.basename(image_path)}", use_column_width=True)
+            except Exception as e:
+                st.error(f"Error loading image: {e}")
+def main():
+    """Main Streamlit application"""
+    # Page configuration
+    st.set_page_config(
+        page_title="Visual Search System",
+        page_icon="🔍",
+        layout="wide",
+        initial_sidebar_state="expanded"
+    )
+    # Main title
+    st.title("🔍 Visual Search System")
+    st.markdown("---")
+    # Sidebar for navigation
+    st.sidebar.header("Navigation")
+    search_option = st.sidebar.selectbox(
+        "Choose search method:",
+        ["Search by ID", "Range by Block"]
+    )
+    # Main content area
+    if search_option == "Search by ID":
+        st.header("🔎 Search Images by ID")
+        # Search input
+        search_id = st.text_input(
+            "Enter image ID (e.g., '0001', '1234') or leave empty to see first 500 images:",
+            placeholder="Enter ID or leave empty",
+            help="Enter a specific image ID or leave empty to browse the first 500 images"
+        )
+        # Search button
+        if st.button("🔍 Search", type="primary") or search_id != "":
+            with st.spinner("Searching images..."):
+                matching_images = search_images_by_id(search_id)
+                if matching_images:
+                    display_image_grid(
+                        matching_images,
+                        f"Showing {len(matching_images)} matching images"
+                    )
+                else:
+                    st.info("No images found matching your search criteria.")
+    else:  # Range by Block
+        st.header("📦 Browse Images by Block")
+        st.markdown(f"""
+        **How it works:**
+        - Each block contains **{IMAGES_PER_BLOCK} images**
+        - Enter a number between **1 and {TOTAL_BLOCKS}**
+        - Example: Enter **100** to see images **10001-10100**
+        """)
+        # Block input
+        block_number = st.number_input(
+            f"Enter block number (1-{TOTAL_BLOCKS}):",
+            min_value=1,
+            max_value=TOTAL_BLOCKS,
+            value=1,
+            step=1,
+            help=f"Choose a block number from 1 to {TOTAL_BLOCKS}"
+        )
+        # Calculate and display block info
+        start_num = (block_number - 1) * IMAGES_PER_BLOCK + 1
+        end_num = block_number * IMAGES_PER_BLOCK
+        st.info(f"**Block {block_number}**: Images {start_num:,} to {end_num:,}")
+        # Get block images
+        with st.spinner(f"Loading block {block_number}..."):
+            block_images = get_block_images(block_number)
+            if block_images:
+                display_image_grid(
+                    block_images,
+                    f"Block {block_number} - Images {start_num:,} to {end_num:,}"
+                )
+            else:
+                st.warning(f"No images found for block {block_number}.")
+    # Footer
+    st.markdown("---")
+    st.markdown(
+        "**Dataset Info:** 25,000+ high-quality images from Unsplash | "
+        "Built with Streamlit and Python"
+    )
+def setup_and_run():
+    """Setup dependencies and run the app"""
+    print("🚀 Starting Visual Search System...")
+    # Step 1: Install dependencies
+    if not check_and_install_dependencies():
+        print("❌ Failed to install dependencies. Exiting.")
+        sys.exit(1)
+    print("✅ Dependencies ready!")
+    # Step 2: Check and download images
+    if not download_images_if_needed():
+        print("❌ Failed to prepare images. Exiting.")
+        sys.exit(1)
+    print("✅ Images ready!")
+    # Step 3: Launch Streamlit app
+    print("🚀 Launching Streamlit app...")
+    main()
+if __name__ == "__main__":
+    setup_and_run()

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+streamlit>=1.28.0
+pandas>=1.5.0
+requests>=2.28.0
+pillow>=9.0.0
+tqdm>=4.64.0