Spaces:

yingzhac
/

sd-controlnet-canny

Sleeping

App Files Files Community

yingzhac commited on May 19

Commit

f2a632c

1 Parent(s): eaa2696

Initial commit for SD ControlNet Canny application

Browse files

Files changed (3) hide show

README.md +26 -25
app.py +83 -32
requirements.txt +2 -1

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-title: Sdxl Refiner
-emoji: 🖼
 colorFrom: purple
 colorTo: red
 sdk: gradio
@@ -8,48 +8,49 @@ sdk_version: 5.25.2
 app_file: app.py
 pinned: false
 license: mit
-short_description: sdxl_refiner
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
-# InstructPix2Pix Application
-This application allows you to edit images using natural language instructions powered by the [InstructPix2Pix](https://github.com/timothybrooks/instruct-pix2pix) model.
-## Setup
-1. Install the required dependencies:
 ```bash
 pip install -r requirements.txt
 ```
-2. Run the application:
 ```bash
 python app.py
 ```
-## Usage
-1. Upload an image or use one of the examples
-2. Enter an instruction for how you want to edit the image (e.g., "Make it look like winter", "Turn the sky into a sunset")
-3. Click "Run" to generate the edited image
-4. Adjust settings in the "Advanced Settings" section for more control:
-   - Image guidance scale: Controls how closely the output follows the input image structure
-   - Guidance scale: Controls how closely the output follows your text instruction
-   - Number of inference steps: Higher values provide better quality but take longer
-## Examples of Instructions
-- "Turn the sky into a sunset"
-- "Make it look like winter"
-- "Turn him into a cyborg"
-- "Make it look like a painting"
-- "Add rain to the scene"
-- "Make it look like night time"
-## Technical Details
-This app uses the [timbrooks/instruct-pix2pix](https://huggingface.co/timbrooks/instruct-pix2pix) model from Hugging Face with the Diffusers library. The model was designed to edit images based on natural language instructions.

 ---
+title: SD ControlNet Canny
+emoji: 🎨
 colorFrom: purple
 colorTo: red
 sdk: gradio
 app_file: app.py
 pinned: false
 license: mit
+short_description: Stable Diffusion with ControlNet Canny Edge Detection
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# ControlNet Canny - Edge Guided Image Generation
+这个应用程序使用 [ControlNet Canny](https://huggingface.co/lllyasviel/sd-controlnet-canny) 模型，通过边缘检测来控制图像生成过程。ControlNet 允许您使用图像的边缘结构来引导 Stable Diffusion 生成符合特定结构的图像。
+## 设置
+1. 安装所需的依赖项：
 ```bash
 pip install -r requirements.txt
 ```
+2. 运行应用程序：
 ```bash
 python app.py
 ```
+## 使用方法
+1. 上传一张图片或使用示例图片
+2. 输入一个描述您想要生成的图像的提示词（例如，"一个充满山脉和湖泊的奇幻风景"）
+3. 点击"运行"生成边缘控制的图像
+4. 在"高级设置"部分调整参数以获得更多控制：
+   - Canny 低阈值/高阈值：控制边缘检测的灵敏度
+   - 指导比例：控制生成的图像与文本提示的匹配程度
+   - 推理步数：更高的值提供更好的质量，但需要更长的时间
+## 提示词示例
+- "一个充满山脉和湖泊的奇幻风景"
+- "一个赛博朋克风格的城市街景"
+- "一个穿着冬装的卡通角色"
+- "一个未来主义的建筑设计"
+- "一个梦幻般的森林场景"
+## 技术细节
+此应用程序使用 Hugging Face 的 [lllyasviel/sd-controlnet-canny](https://huggingface.co/lllyasviel/sd-controlnet-canny) 模型和 Diffusers 库。该模型通过 Canny 边缘检测算法提取输入图像的边缘，然后使用这些边缘来引导 Stable Diffusion 生成遵循同样结构的新图像。
+ControlNet 能够保持输入图像的结构和构图，同时根据文本提示更改图像的样式和内容。

app.py CHANGED Viewed

@@ -1,31 +1,62 @@
 import gradio as gr
 import numpy as np
 import random
 import spaces
 import torch
-from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler
 from diffusers.utils import load_image
 device = "cuda" if torch.cuda.is_available() else "cpu"
-model_repo_id = "timbrooks/instruct-pix2pix"
 if torch.cuda.is_available():
     torch_dtype = torch.float16
 else:
     torch_dtype = torch.float32
-pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(
-    model_repo_id,
     torch_dtype=torch_dtype,
     safety_checker=None
 )
 pipe = pipe.to(device)
-pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
 MAX_SEED = np.iinfo(np.int32).max
 MAX_IMAGE_SIZE = 1024
 @spaces.GPU
 def infer(
     prompt,
@@ -33,7 +64,8 @@ def infer(
     negative_prompt,
     seed,
     randomize_seed,
-    image_guidance_scale,
     guidance_scale,
     num_inference_steps,
     progress=gr.Progress(track_tqdm=True),
@@ -55,24 +87,26 @@ def infer(
             width = MAX_IMAGE_SIZE
         if height > MAX_IMAGE_SIZE:
             height = MAX_IMAGE_SIZE
     image = pipe(
         prompt=prompt,
-        image=input_image,
         negative_prompt=negative_prompt,
         guidance_scale=guidance_scale,
-        image_guidance_scale=image_guidance_scale,
         num_inference_steps=num_inference_steps,
         generator=generator,
     ).images[0]
-    return image, seed
 examples = [
-    ["Turn the sky into a sunset", "https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/aa_xl/000000009.png"],
-    ["Turn him into a cyborg", "https://raw.githubusercontent.com/timothybrooks/instruct-pix2pix/main/imgs/example.jpg"],
-    ["Make it look like winter", "https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/aa_xl/000000009.png"],
 ]
 css = """
@@ -84,7 +118,7 @@ css = """
 with gr.Blocks(css=css) as demo:
     with gr.Column(elem_id="col-container"):
-        gr.Markdown(" # InstructPix2Pix - Image Editing")
         with gr.Row():
             with gr.Column(scale=1):
@@ -94,11 +128,19 @@ with gr.Blocks(css=css) as demo:
                     height=400
                 )
             with gr.Column(scale=1):
-                result = gr.Image(label="Result", height=400)
         prompt = gr.Text(
-            label="Instruction",
-            placeholder="Enter your instruction (e.g., 'turn the sky into a sunset')",
         )
         run_button = gr.Button("Run", variant="primary")
@@ -111,22 +153,30 @@ with gr.Blocks(css=css) as demo:
             )
             with gr.Row():
-                image_guidance_scale = gr.Slider(
-                    label="Image guidance scale",
-                    minimum=0.0,
-                    maximum=5.0,
-                    step=0.1,
-                    value=1.0,
                 )
-                guidance_scale = gr.Slider(
-                    label="Guidance scale",
-                    minimum=1.0,
-                    maximum=20.0,
-                    step=0.1,
-                    value=7.5,
                 )
             seed = gr.Slider(
                 label="Seed",
                 minimum=0,
@@ -142,13 +192,13 @@ with gr.Blocks(css=css) as demo:
                 minimum=1,
                 maximum=100,
                 step=1,
-                value=20,
             )
         gr.Examples(
             examples=examples,
             inputs=[prompt, input_image],
-            outputs=[result, seed],
             fn=infer,
             cache_examples=True,
         )
@@ -162,11 +212,12 @@ with gr.Blocks(css=css) as demo:
             negative_prompt,
             seed,
             randomize_seed,
-            image_guidance_scale,
             guidance_scale,
             num_inference_steps,
         ],
-        outputs=[result, seed],
     )
 if __name__ == "__main__":

 import gradio as gr
 import numpy as np
 import random
+import cv2
 import spaces
 import torch
+from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
 from diffusers.utils import load_image
+from PIL import Image
 device = "cuda" if torch.cuda.is_available() else "cpu"
+sd_model_id = "runwayml/stable-diffusion-v1-5"
+controlnet_model_id = "lllyasviel/sd-controlnet-canny"
 if torch.cuda.is_available():
     torch_dtype = torch.float16
 else:
     torch_dtype = torch.float32
+# Load ControlNet model
+controlnet = ControlNetModel.from_pretrained(
+    controlnet_model_id,
+    torch_dtype=torch_dtype
+)
+# Load Stable Diffusion with ControlNet
+pipe = StableDiffusionControlNetPipeline.from_pretrained(
+    sd_model_id,
+    controlnet=controlnet,
     torch_dtype=torch_dtype,
     safety_checker=None
 )
 pipe = pipe.to(device)
+pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
 MAX_SEED = np.iinfo(np.int32).max
 MAX_IMAGE_SIZE = 1024
+def apply_canny(image, low_threshold, high_threshold):
+    """Apply Canny edge detection to the image"""
+    # Convert PIL image to numpy array
+    image_np = np.array(image)
+    # Convert to grayscale if the image is colored
+    if len(image_np.shape) == 3 and image_np.shape[2] == 3:
+        image_gray = cv2.cvtColor(image_np, cv2.COLOR_RGB2GRAY)
+    else:
+        image_gray = image_np
+    # Apply Canny edge detection
+    edges = cv2.Canny(image_gray, low_threshold, high_threshold)
+    # Convert back to RGB for the model
+    edges_rgb = cv2.cvtColor(edges, cv2.COLOR_GRAY2RGB)
+    # Convert back to PIL image
+    return Image.fromarray(edges_rgb)
 @spaces.GPU
 def infer(
     prompt,
     negative_prompt,
     seed,
     randomize_seed,
+    canny_low_threshold,
+    canny_high_threshold,
     guidance_scale,
     num_inference_steps,
     progress=gr.Progress(track_tqdm=True),
             width = MAX_IMAGE_SIZE
         if height > MAX_IMAGE_SIZE:
             height = MAX_IMAGE_SIZE
+        # Apply Canny edge detection
+        canny_image = apply_canny(input_image, canny_low_threshold, canny_high_threshold)
     image = pipe(
         prompt=prompt,
+        image=canny_image,
         negative_prompt=negative_prompt,
         guidance_scale=guidance_scale,
         num_inference_steps=num_inference_steps,
         generator=generator,
     ).images[0]
+    return image, seed, canny_image
 examples = [
+    ["A fantasy landscape with mountains and a lake", "https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/aa_xl/000000009.png"],
+    ["A cyberpunk city street scene", "https://raw.githubusercontent.com/timothybrooks/instruct-pix2pix/main/imgs/example.jpg"],
+    ["A cartoon character in winter clothing", "https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/controlnet/person_image.png"],
 ]
 css = """
 with gr.Blocks(css=css) as demo:
     with gr.Column(elem_id="col-container"):
+        gr.Markdown(" # ControlNet Canny - Edge Guided Image Generation")
         with gr.Row():
             with gr.Column(scale=1):
                     height=400
                 )
             with gr.Column(scale=1):
+                canny_image = gr.Image(
+                    label="Canny Edge Detection",
+                    height=400
+                )
+            with gr.Column(scale=1):
+                result = gr.Image(
+                    label="Result",
+                    height=400
+                )
         prompt = gr.Text(
+            label="Prompt",
+            placeholder="Enter your prompt (e.g., 'a fantasy landscape with mountains')",
         )
         run_button = gr.Button("Run", variant="primary")
             )
             with gr.Row():
+                canny_low_threshold = gr.Slider(
+                    label="Canny Low Threshold",
+                    minimum=1,
+                    maximum=255,
+                    step=1,
+                    value=100,
                 )
+                canny_high_threshold = gr.Slider(
+                    label="Canny High Threshold",
+                    minimum=1,
+                    maximum=255,
+                    step=1,
+                    value=200,
                 )
+            guidance_scale = gr.Slider(
+                label="Guidance scale",
+                minimum=1.0,
+                maximum=20.0,
+                step=0.1,
+                value=7.5,
+            )
             seed = gr.Slider(
                 label="Seed",
                 minimum=0,
                 minimum=1,
                 maximum=100,
                 step=1,
+                value=30,
             )
         gr.Examples(
             examples=examples,
             inputs=[prompt, input_image],
+            outputs=[result, seed, canny_image],
             fn=infer,
             cache_examples=True,
         )
             negative_prompt,
             seed,
             randomize_seed,
+            canny_low_threshold,
+            canny_high_threshold,
             guidance_scale,
             num_inference_steps,
         ],
+        outputs=[result, seed, canny_image],
     )
 if __name__ == "__main__":

requirements.txt CHANGED Viewed

@@ -5,4 +5,5 @@ accelerate>=0.21.0
 gradio>=3.50.0
 numpy>=1.24.0
 Pillow>=10.0.0
-safetensors>=0.3.2

 gradio>=3.50.0
 numpy>=1.24.0
 Pillow>=10.0.0
+safetensors>=0.3.2
+opencv-python>=4.8.0