Shakker-Labs
/

FLUX.1-dev-ControlNet-Union-Pro-2.0

@@ -24,17 +24,29 @@ This repository contains an unified ControlNet for FLUX.1-dev model released by
 # Keynotes
 In comparison with [Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro),
-- Remove mode embedding. Smaller model size (6.6GB -> 4.0GB).
 - Improve on canny and pose, better control and aesthetics.
 - Add support for soft edge. Remove support for tile.
 # Model Cards
 - This ControlNet consists of 6 double blocks and 0 single block as the same as [Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro). Mode embedding is removed.
 - We train the model from scratch for 300k steps using a dataset of 20M high-quality general and human images. We train at 512x512 resolution in BFloat16, batch size = 128, learning rate = 2e-5, the guidance is uniformly sampled from [1, 7]. We set the text drop ratio to 0.20.
-- This model supports multiple control modes, including canny, soft edge, depth, pose, gray.
 - This model can be jointly used with other ControlNets.
 # Inference
 ```python
 import torch
@@ -48,7 +60,11 @@ controlnet = FluxControlNetModel.from_pretrained(controlnet_model_union, torch_d
 pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=controlnet, torch_dtype=torch.bfloat16)
 pipe.to("cuda")
-width, height = control_image_depth.size
 image = pipe(
     prompt,
@@ -57,9 +73,9 @@ image = pipe(
     height=height,
     controlnet_conditioning_scale=0.7,
     control_guidance_end=0.8,
-    num_inference_steps=24,
     guidance_scale=3.5,
-    generator=torch.manual_seed(42),
 ).images[0]
 ```

 # Keynotes
 In comparison with [Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro),
+- Remove mode embedding, has smaller model size.
 - Improve on canny and pose, better control and aesthetics.
 - Add support for soft edge. Remove support for tile.
 # Model Cards
 - This ControlNet consists of 6 double blocks and 0 single block as the same as [Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro). Mode embedding is removed.
 - We train the model from scratch for 300k steps using a dataset of 20M high-quality general and human images. We train at 512x512 resolution in BFloat16, batch size = 128, learning rate = 2e-5, the guidance is uniformly sampled from [1, 7]. We set the text drop ratio to 0.20.
+- This model supports multiple control modes, including canny, soft edge, depth, pose, gray. You can use it just as a normal ControlNet.
 - This model can be jointly used with other ControlNets.
+# Showcases
+<table>
+  <tr>
+    <td><img src="./images/canny.png" alt="canny" style="width:100%"></td>
+    <td><img src="./images/softedge.png" alt="softedge" style="width:100%"></td>
+    <td><img src="./images/pose.png" alt="pose" style="width:100%"></td>
+    <td><img src="./images/depth.png" alt="depth" style="width:100%"></td>
+    <td><img src="./images/gray.png" alt="gray" style="width:100%"></td>
+  </tr>
+</table>
 # Inference
 ```python
 import torch
 pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=controlnet, torch_dtype=torch.bfloat16)
 pipe.to("cuda")
+# replace with other conds
+control_image = load_image("./conds/canny.png")
+width, height = control_image.size
+prompt = "A young girl stands gracefully at the edge of a serene beach, her long, flowing hair gently tousled by the sea breeze. She wears a soft, pastel-colored dress that complements the tranquil blues and greens of the coastal scenery. The golden hues of the setting sun cast a warm glow on her face, highlighting her serene expression. The background features a vast, azure ocean with gentle waves lapping at the shore, surrounded by distant cliffs and a clear, cloudless sky. The composition emphasizes the girl's serene presence amidst the natural beauty, with a balanced blend of warm and cool tones."
 image = pipe(
     prompt,
     height=height,
     controlnet_conditioning_scale=0.7,
     control_guidance_end=0.8,
+    num_inference_steps=30,
     guidance_scale=3.5,
+    generator=torch.Generator(device="cuda").manual_seed(42),
 ).images[0]
 ```