visual-ai / README.md
bniladridas's picture
Update README.md
165bc21 verified
---
title: Visual Ai
emoji: πŸ–Ό
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 5.20.1
app_file: app.py
pinned: false
license: mit
short_description: What you wish to see in the output image.
---
# Stable Diffusion Image Generator
## Overview
This project provides a **Stable Diffusion** image generator powered by the `stabilityai/stable-diffusion-2-1` model. It’s optimized for **GPU execution with CUDA** but includes a **CPU fallback** option, allowing flexibility based on hardware availability. The application uses the `diffusers` library and a `gradio`-based UI for interactive image generation.
## Features
- Runs on **GPU (CUDA)** with FP16 precision and memory optimizations or **CPU** with FP32 precision.
- Customizable parameters: prompt, resolution, seed, inference steps, and guidance scale.
- Toggle between GPU and CPU execution via the UI.
- Built-in performance optimizations for GPU (e.g., memory-efficient attention, tiling).
## Prerequisites
- **Python 3.8+**
- A **CUDA-compatible GPU** (optional but recommended for performance).
- A **Hugging Face account** and API token for model access.
### Required Dependencies
- `torch` (with CUDA support for GPU usage)
- `diffusers` (for the Stable Diffusion pipeline)
- `gradio` (for the UI)
- `huggingface_hub` (for authentication)
- `xformers` (optional, for GPU memory optimization)
- `transformers` (transitive dependency of `diffusers`)
### Install Dependencies
For GPU support (adjust PyTorch CUDA version as needed):
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install diffusers[torch] gradio huggingface_hub transformers
pip install xformers # Optional, for GPU memory optimization
```
For CPU-only:
```bash
pip install torch torchvision torchaudio
pip install diffusers[torch] gradio huggingface_hub transformers
```
## Environment Setup
Set your **Hugging Face API token** as an environment variable:
```bash
export HUGGINGFACE_TOKEN=your_huggingface_api_token
```
## Run the Application
```bash
python app.py
```
This launches a Gradio UI where you can input parameters and generate images.
## Code Implementation
The pipeline dynamically selects the device (`cuda` or `cpu`) based on availability and user preference. Here’s a summary of the implementation:
```python
import torch
from diffusers import StableDiffusionPipeline
import gradio as gr
import os
import time
import logging
from huggingface_hub import login
# Logging setup
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
# Load and authenticate with Hugging Face token
hf_token = os.getenv("HUGGINGFACE_TOKEN")
if not hf_token:
raise ValueError("❌ Error: Hugging Face token not found!")
login(token=hf_token)
# Model setup
model_id = "stabilityai/stable-diffusion-2-1"
device = "cuda" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if device == "cuda" else torch.float32
pipe = StableDiffusionPipeline.from_pretrained(
model_id,
torch_dtype=torch_dtype,
revision="fp16" if device == "cuda" else None,
use_auth_token=hf_token
)
# GPU optimizations (if applicable)
if device == "cuda":
pipe.to("cuda")
pipe.enable_xformers_memory_efficient_attention()
pipe.vae.enable_tiling()
pipe.enable_attention_slicing()
torch.backends.cuda.matmul.allow_tf32 = True
logging.info(f"πŸš€ Running on: {device.upper()} with {torch_dtype}")
# Image generation function
def generate_image(prompt, seed, resolution, steps, guidance, use_gpu):
device = "cuda" if use_gpu and torch.cuda.is_available() else "cpu"
pipe.to(device)
width, height = map(int, resolution.split("x"))
generator = torch.Generator(device).manual_seed(int(seed)) if seed != "-1" else None
with torch.autocast("cuda") if device == "cuda" else torch.no_grad():
image = pipe(prompt, num_inference_steps=int(steps), guidance_scale=float(guidance),
generator=generator, width=width, height=height).images[0]
return image
# Gradio UI setup
with gr.Blocks() as demo:
gr.Markdown("# πŸ–ŒοΈ **Stable Diffusion Image Generator**")
with gr.Row():
with gr.Column():
prompt_input = gr.Textbox(label="🎨 Prompt")
resolution_input = gr.Textbox(label="πŸ“ Resolution", value="512x512")
seed_input = gr.Textbox(label="πŸ”’ Seed (-1 for random)", value="-1")
steps_input = gr.Slider(10, 50, value=30, label="πŸ› οΈ Inference Steps")
guidance_input = gr.Slider(1.0, 15.0, value=7.5, label="πŸŽ›οΈ Guidance Scale")
gpu_toggle = gr.Checkbox(label="⚑ Use GPU (if available)", value=True)
generate_button = gr.Button("πŸš€ Generate Image")
with gr.Column():
image_output = gr.Image(label="πŸ–ΌοΈ Generated Image")
generate_button.click(fn=generate_image, inputs=[prompt_input, seed_input, resolution_input,
steps_input, guidance_input, gpu_toggle],
outputs=image_output)
demo.launch()
```
## Key Notes
- **Device Flexibility:** The script defaults to GPU if available but falls back to CPU if toggled or no GPU is detected.
- **Optimizations:** GPU mode uses FP16, memory-efficient attention (via `xformers`), tiling, and attention slicing.
- **Mixed Precision:** Uses `torch.autocast` on GPU; `torch.no_grad` on CPU.
- **Optional `xformers`:** Required for GPU memory optimization; install it if using CUDA.
## Troubleshooting
### Issue: `ValueError: ❌ Error: Hugging Face token not found!`
**Solution:** Set the `HUGGINGFACE_TOKEN` environment variable:
```bash
export HUGGINGFACE_TOKEN=your_huggingface_api_token
```
### Issue: GPU not detected but expected
**Solution:**
- Check CUDA installation: `nvidia-smi`
- Ensure PyTorch is installed with CUDA support: `pip list | grep torch`
### Issue: `enable_xformers_memory_efficient_attention` fails
**Solution:** Install `xformers`:
```bash
pip install xformers
```
## Conclusion
This project delivers a flexible and efficient **Stable Diffusion** image generator, balancing GPU performance with CPU compatibility. Enjoy creating AI art with ease! πŸš€