--- title: Visual Ai emoji: πŸ–Ό colorFrom: purple colorTo: red sdk: gradio sdk_version: 5.20.1 app_file: app.py pinned: false license: mit short_description: What you wish to see in the output image. --- # Stable Diffusion Image Generator ## Overview This project provides a **Stable Diffusion** image generator powered by the `stabilityai/stable-diffusion-2-1` model. It’s optimized for **GPU execution with CUDA** but includes a **CPU fallback** option, allowing flexibility based on hardware availability. The application uses the `diffusers` library and a `gradio`-based UI for interactive image generation. ## Features - Runs on **GPU (CUDA)** with FP16 precision and memory optimizations or **CPU** with FP32 precision. - Customizable parameters: prompt, resolution, seed, inference steps, and guidance scale. - Toggle between GPU and CPU execution via the UI. - Built-in performance optimizations for GPU (e.g., memory-efficient attention, tiling). ## Prerequisites - **Python 3.8+** - A **CUDA-compatible GPU** (optional but recommended for performance). - A **Hugging Face account** and API token for model access. ### Required Dependencies - `torch` (with CUDA support for GPU usage) - `diffusers` (for the Stable Diffusion pipeline) - `gradio` (for the UI) - `huggingface_hub` (for authentication) - `xformers` (optional, for GPU memory optimization) - `transformers` (transitive dependency of `diffusers`) ### Install Dependencies For GPU support (adjust PyTorch CUDA version as needed): ```bash pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install diffusers[torch] gradio huggingface_hub transformers pip install xformers # Optional, for GPU memory optimization ``` For CPU-only: ```bash pip install torch torchvision torchaudio pip install diffusers[torch] gradio huggingface_hub transformers ``` ## Environment Setup Set your **Hugging Face API token** as an environment variable: ```bash export HUGGINGFACE_TOKEN=your_huggingface_api_token ``` ## Run the Application ```bash python app.py ``` This launches a Gradio UI where you can input parameters and generate images. ## Code Implementation The pipeline dynamically selects the device (`cuda` or `cpu`) based on availability and user preference. Here’s a summary of the implementation: ```python import torch from diffusers import StableDiffusionPipeline import gradio as gr import os import time import logging from huggingface_hub import login # Logging setup logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s") # Load and authenticate with Hugging Face token hf_token = os.getenv("HUGGINGFACE_TOKEN") if not hf_token: raise ValueError("❌ Error: Hugging Face token not found!") login(token=hf_token) # Model setup model_id = "stabilityai/stable-diffusion-2-1" device = "cuda" if torch.cuda.is_available() else "cpu" torch_dtype = torch.float16 if device == "cuda" else torch.float32 pipe = StableDiffusionPipeline.from_pretrained( model_id, torch_dtype=torch_dtype, revision="fp16" if device == "cuda" else None, use_auth_token=hf_token ) # GPU optimizations (if applicable) if device == "cuda": pipe.to("cuda") pipe.enable_xformers_memory_efficient_attention() pipe.vae.enable_tiling() pipe.enable_attention_slicing() torch.backends.cuda.matmul.allow_tf32 = True logging.info(f"πŸš€ Running on: {device.upper()} with {torch_dtype}") # Image generation function def generate_image(prompt, seed, resolution, steps, guidance, use_gpu): device = "cuda" if use_gpu and torch.cuda.is_available() else "cpu" pipe.to(device) width, height = map(int, resolution.split("x")) generator = torch.Generator(device).manual_seed(int(seed)) if seed != "-1" else None with torch.autocast("cuda") if device == "cuda" else torch.no_grad(): image = pipe(prompt, num_inference_steps=int(steps), guidance_scale=float(guidance), generator=generator, width=width, height=height).images[0] return image # Gradio UI setup with gr.Blocks() as demo: gr.Markdown("# πŸ–ŒοΈ **Stable Diffusion Image Generator**") with gr.Row(): with gr.Column(): prompt_input = gr.Textbox(label="🎨 Prompt") resolution_input = gr.Textbox(label="πŸ“ Resolution", value="512x512") seed_input = gr.Textbox(label="πŸ”’ Seed (-1 for random)", value="-1") steps_input = gr.Slider(10, 50, value=30, label="πŸ› οΈ Inference Steps") guidance_input = gr.Slider(1.0, 15.0, value=7.5, label="πŸŽ›οΈ Guidance Scale") gpu_toggle = gr.Checkbox(label="⚑ Use GPU (if available)", value=True) generate_button = gr.Button("πŸš€ Generate Image") with gr.Column(): image_output = gr.Image(label="πŸ–ΌοΈ Generated Image") generate_button.click(fn=generate_image, inputs=[prompt_input, seed_input, resolution_input, steps_input, guidance_input, gpu_toggle], outputs=image_output) demo.launch() ``` ## Key Notes - **Device Flexibility:** The script defaults to GPU if available but falls back to CPU if toggled or no GPU is detected. - **Optimizations:** GPU mode uses FP16, memory-efficient attention (via `xformers`), tiling, and attention slicing. - **Mixed Precision:** Uses `torch.autocast` on GPU; `torch.no_grad` on CPU. - **Optional `xformers`:** Required for GPU memory optimization; install it if using CUDA. ## Troubleshooting ### Issue: `ValueError: ❌ Error: Hugging Face token not found!` **Solution:** Set the `HUGGINGFACE_TOKEN` environment variable: ```bash export HUGGINGFACE_TOKEN=your_huggingface_api_token ``` ### Issue: GPU not detected but expected **Solution:** - Check CUDA installation: `nvidia-smi` - Ensure PyTorch is installed with CUDA support: `pip list | grep torch` ### Issue: `enable_xformers_memory_efficient_attention` fails **Solution:** Install `xformers`: ```bash pip install xformers ``` ## Conclusion This project delivers a flexible and efficient **Stable Diffusion** image generator, balancing GPU performance with CPU compatibility. Enjoy creating AI art with ease! πŸš€