SAM-Grounding-DINO / README.md
tokeron's picture
Upload folder using huggingface_hub
0a9b595 verified

A newer version of the Gradio SDK is available: 5.46.0

Upgrade
metadata
title: SAM-Grounding-DINO
emoji: 🎭
colorFrom: indigo
colorTo: purple
sdk: gradio
app_file: app.py

🎭 SAM 2.1 + Grounding DINO Interactive Segmentation

A web application combining Meta's SAM 2.1 and Grounding DINO for both text-based and point-based image segmentation to enable creating and downloading a desired mask.

✨ Features

  • πŸ” Text-Based Segmentation: Type what you want to segment (e.g., "snoopy", "person", "car")
  • πŸ“ Point-Based Segmentation: Click on objects for precise manual control
  • 🎭 Multiple Mask Generation: Generate 1-5 masks and browse through them
  • πŸ€– SAM 2.1 + Grounding DINO: Powered by Meta's SAM 2.1 and IDEA Research's Grounding DINO
  • πŸ“± Smart Auto-Detection: Automatically chooses between text and point modes
  • πŸ’Ύ Multiple Export Formats: Download masks as PNG, JPG, or PyTorch tensors
  • πŸ–ΌοΈ High-Resolution Display: View images and masks in full detail
  • ⚑ Real-Time Processing: Fast inference with GPU acceleration

πŸš€ Quick Start

Installation

  1. Clone or download the repository
  2. Install dependencies:
    pip install -r requirements.txt
    

Running the App

streamlit run streamlit_sam_app.py

The app will open in your browser at http://localhost:8501

🎯 How to Use

1. Upload an Image

  • Click "πŸ“· Upload an image" to select an image file
  • Supported formats: JPG, JPEG, PNG, BMP

2. Add Points

Choose between Positive (include) or Negative (exclude) point mode:

Quick Presets:

  • 🎯 Center: Add point at image center
  • ↖️ Top-Left: Add point at top-left quarter
  • ↗️ Top-Right: Add point at top-right quarter
  • 🎲 Random: Add random point anywhere

Manual Input:

  • Enter X,Y coordinates manually
  • Points are validated against image boundaries

3. Generate Segmentation Mask

  • Click "🎯 Generate Segmentation Mask"
  • Adjust the mask threshold in the sidebar (0.0-1.0)
  • Wait for SAM 2.0 to process (may take 10-30 seconds)

4. View Results

  • Original Image with Points: Shows your input selections
  • Generated Segmentation Mask: Red overlay on original image
  • Binary Mask Preview: Black/white mask for download
  • Statistics: Pixel counts and coverage percentage

5. Download Results

  • πŸ“₯ Download Mask (PNG): Binary mask file
  • πŸ“₯ Download Overlay (PNG): Mask overlaid on original
  • πŸ“₯ Download Data (JSON): Complete metadata and statistics

πŸŽ›οΈ Advanced Controls

Sidebar Options:

  • Point Mode: Switch between Positive/Negative points
  • Mask Threshold: Control mask sensitivity (lower = larger masks)
  • Clear Points: Remove all points at once

Point Management:

  • View all current points with coordinates
  • Delete individual points with πŸ—‘οΈ buttons
  • Real-time count of positive/negative points

πŸ”§ Technical Details

SAM 2.0 Model

  • Uses facebook/sam2-hiera-small by default
  • Automatically downloads model weights on first run
  • Runs on GPU if available, CPU otherwise

Dependencies

  • streamlit: Web interface
  • torch: PyTorch for model inference
  • transformers: Hugging Face model loading
  • PIL: Image processing
  • matplotlib: Visualization
  • numpy: Numerical operations
  • opencv-python: Image processing utilities

System Requirements

  • Python 3.8+
  • 4GB+ RAM recommended
  • GPU recommended for faster processing

πŸ› Troubleshooting

Common Issues:

  1. Model Download Fails:

    • Check internet connection
    • Ensure Hugging Face access (may require token for some models)
  2. CUDA Out of Memory:

    • Try smaller model size
    • Reduce image resolution
    • Use CPU mode: set CUDA_VISIBLE_DEVICES=""
  3. Slow Processing:

    • Use GPU if available
    • Try sam2-hiera-tiny model for faster inference
  4. Import Errors:

    • Ensure all dependencies are installed: pip install -r requirements.txt

πŸ“ File Structure

SAM/
β”œβ”€β”€ streamlit_sam_app.py    # Main application
β”œβ”€β”€ fixed_sam_interface.py  # Original Gradio version
β”œβ”€β”€ requirements.txt        # Dependencies
└── README.md              # This file

🎨 Interface Screenshots

The app features a clean, modern interface with:

  • Full-width image display
  • Intuitive sidebar controls
  • Real-time point visualization
  • Side-by-side result comparison
  • Comprehensive download options

🀝 Contributing

Feel free to submit issues, feature requests, or pull requests!

πŸ“„ License

This project uses Meta's SAM 2.0 model. Please refer to Meta's license terms for the model weights.

πŸ™ Acknowledgments

  • Meta AI for the incredible SAM 2.0 model
  • Streamlit for the amazing web app framework
  • Hugging Face for model hosting
  • The open-source community for all the dependencies