File size: 6,243 Bytes
2a682e1
ea8002e
2a682e1
 
 
 
4a6bb1e
2a682e1
 
ea8002e
 
2a682e1
 
9a3db89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165bc21
 
 
 
 
 
9a3db89
 
 
 
 
165bc21
 
9a3db89
 
 
 
165bc21
9a3db89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165bc21
 
 
9a3db89
 
 
 
 
 
 
 
 
 
 
 
 
165bc21
 
 
 
 
9a3db89
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
title: Visual Ai
emoji: πŸ–Ό
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 5.20.1
app_file: app.py
pinned: false
license: mit
short_description: What you wish to see in the output image.
---

# Stable Diffusion Image Generator

## Overview
This project provides a **Stable Diffusion** image generator powered by the `stabilityai/stable-diffusion-2-1` model. It’s optimized for **GPU execution with CUDA** but includes a **CPU fallback** option, allowing flexibility based on hardware availability. The application uses the `diffusers` library and a `gradio`-based UI for interactive image generation.

## Features
- Runs on **GPU (CUDA)** with FP16 precision and memory optimizations or **CPU** with FP32 precision.
- Customizable parameters: prompt, resolution, seed, inference steps, and guidance scale.
- Toggle between GPU and CPU execution via the UI.
- Built-in performance optimizations for GPU (e.g., memory-efficient attention, tiling).

## Prerequisites
- **Python 3.8+**
- A **CUDA-compatible GPU** (optional but recommended for performance).
- A **Hugging Face account** and API token for model access.

### Required Dependencies
- `torch` (with CUDA support for GPU usage)
- `diffusers` (for the Stable Diffusion pipeline)
- `gradio` (for the UI)
- `huggingface_hub` (for authentication)
- `xformers` (optional, for GPU memory optimization)
- `transformers` (transitive dependency of `diffusers`)

### Install Dependencies
For GPU support (adjust PyTorch CUDA version as needed):
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install diffusers[torch] gradio huggingface_hub transformers
pip install xformers  # Optional, for GPU memory optimization
```
For CPU-only:
```bash
pip install torch torchvision torchaudio
pip install diffusers[torch] gradio huggingface_hub transformers
```

## Environment Setup
Set your **Hugging Face API token** as an environment variable:
```bash
export HUGGINGFACE_TOKEN=your_huggingface_api_token
```

## Run the Application
```bash
python app.py
```
This launches a Gradio UI where you can input parameters and generate images.

## Code Implementation
The pipeline dynamically selects the device (`cuda` or `cpu`) based on availability and user preference. Here’s a summary of the implementation:

```python
import torch
from diffusers import StableDiffusionPipeline
import gradio as gr
import os
import time
import logging
from huggingface_hub import login

# Logging setup
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

# Load and authenticate with Hugging Face token
hf_token = os.getenv("HUGGINGFACE_TOKEN")
if not hf_token:
    raise ValueError("❌ Error: Hugging Face token not found!")
login(token=hf_token)

# Model setup
model_id = "stabilityai/stable-diffusion-2-1"
device = "cuda" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if device == "cuda" else torch.float32

pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    torch_dtype=torch_dtype,
    revision="fp16" if device == "cuda" else None,
    use_auth_token=hf_token
)

# GPU optimizations (if applicable)
if device == "cuda":
    pipe.to("cuda")
    pipe.enable_xformers_memory_efficient_attention()
    pipe.vae.enable_tiling()
    pipe.enable_attention_slicing()
    torch.backends.cuda.matmul.allow_tf32 = True

logging.info(f"πŸš€ Running on: {device.upper()} with {torch_dtype}")

# Image generation function
def generate_image(prompt, seed, resolution, steps, guidance, use_gpu):
    device = "cuda" if use_gpu and torch.cuda.is_available() else "cpu"
    pipe.to(device)
    width, height = map(int, resolution.split("x"))
    generator = torch.Generator(device).manual_seed(int(seed)) if seed != "-1" else None

    with torch.autocast("cuda") if device == "cuda" else torch.no_grad():
        image = pipe(prompt, num_inference_steps=int(steps), guidance_scale=float(guidance),
                     generator=generator, width=width, height=height).images[0]
    return image

# Gradio UI setup
with gr.Blocks() as demo:
    gr.Markdown("# πŸ–ŒοΈ **Stable Diffusion Image Generator**")
    with gr.Row():
        with gr.Column():
            prompt_input = gr.Textbox(label="🎨 Prompt")
            resolution_input = gr.Textbox(label="πŸ“ Resolution", value="512x512")
            seed_input = gr.Textbox(label="πŸ”’ Seed (-1 for random)", value="-1")
            steps_input = gr.Slider(10, 50, value=30, label="πŸ› οΈ Inference Steps")
            guidance_input = gr.Slider(1.0, 15.0, value=7.5, label="πŸŽ›οΈ Guidance Scale")
            gpu_toggle = gr.Checkbox(label="⚑ Use GPU (if available)", value=True)
            generate_button = gr.Button("πŸš€ Generate Image")
        with gr.Column():
            image_output = gr.Image(label="πŸ–ΌοΈ Generated Image")
    generate_button.click(fn=generate_image, inputs=[prompt_input, seed_input, resolution_input,
                                                    steps_input, guidance_input, gpu_toggle],
                          outputs=image_output)

demo.launch()
```

## Key Notes
- **Device Flexibility:** The script defaults to GPU if available but falls back to CPU if toggled or no GPU is detected.
- **Optimizations:** GPU mode uses FP16, memory-efficient attention (via `xformers`), tiling, and attention slicing.
- **Mixed Precision:** Uses `torch.autocast` on GPU; `torch.no_grad` on CPU.
- **Optional `xformers`:** Required for GPU memory optimization; install it if using CUDA.

## Troubleshooting
### Issue: `ValueError: ❌ Error: Hugging Face token not found!`
**Solution:** Set the `HUGGINGFACE_TOKEN` environment variable:
```bash
export HUGGINGFACE_TOKEN=your_huggingface_api_token
```

### Issue: GPU not detected but expected
**Solution:**
- Check CUDA installation: `nvidia-smi`
- Ensure PyTorch is installed with CUDA support: `pip list | grep torch`

### Issue: `enable_xformers_memory_efficient_attention` fails
**Solution:** Install `xformers`:
```bash
pip install xformers
```

## Conclusion
This project delivers a flexible and efficient **Stable Diffusion** image generator, balancing GPU performance with CPU compatibility. Enjoy creating AI art with ease! πŸš€