File size: 7,439 Bytes
85ac86b b547219 85ac86b 3764e4d 85ac86b a4e75bb 85ac86b 3764e4d 85ac86b 3764e4d df49d47 3764e4d 85ac86b 3764e4d 85ac86b 3764e4d 85ac86b 3764e4d 85ac86b 6897762 70f9826 6897762 85ac86b c2271c9 85ac86b 4732071 188a266 4732071 188a266 4732071 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
---
license: other
license_name: flux-1-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
language:
- en
library_name: diffusers
pipeline_tag: text-to-image
tags:
- Text-to-Image
- ControlNet
- Diffusers
- Flux.1-dev
- image-generation
- Stable Diffusion
base_model: black-forest-labs/FLUX.1-dev
---
# FLUX.1-dev-ControlNet-Union-Pro-2.0
This repository contains an unified ControlNet for FLUX.1-dev model released by [Shakker Labs](https://huggingface.co/Shakker-Labs). We provide an [online demo](https://huggingface.co/spaces/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0). A FP8 quantized version provided by community can be found in [ABDALLALSWAITI/FLUX.1-dev-ControlNet-Union-Pro-2.0-fp8](https://huggingface.co/ABDALLALSWAITI/FLUX.1-dev-ControlNet-Union-Pro-2.0-fp8).
# Keynotes
In comparison with [Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro),
- Remove mode embedding, has smaller model size.
- Improve on canny and pose, better control and aesthetics.
- Add support for soft edge. Remove support for tile.
# Model Cards
- This ControlNet consists of 6 double blocks and 0 single block. Mode embedding is removed.
- We train the model from scratch for 300k steps using a dataset of 20M high-quality general and human images. We train at 512x512 resolution in BFloat16, batch size = 128, learning rate = 2e-5, the guidance is uniformly sampled from [1, 7]. We set the text drop ratio to 0.20.
- This model supports multiple control modes, including canny, soft edge, depth, pose, gray. You can use it just as a normal ControlNet.
- This model can be jointly used with other ControlNets.
# Showcases
<table>
<tr>
<td><img src="./images/canny.png" alt="canny" style="height:100%"></td>
</tr>
<tr>
<td><img src="./images/softedge.png" alt="softedge" style="height:100%"></td>
</tr>
<tr>
<td><img src="./images/pose.png" alt="pose" style="height:100%"></td>
</tr>
<tr>
<td><img src="./images/depth.png" alt="depth" style="height:100%"></td>
</tr>
<tr>
<td><img src="./images/gray.png" alt="gray" style="height:100%"></td>
</tr>
</table>
# Inference
```python
import torch
from diffusers.utils import load_image
from diffusers import FluxControlNetPipeline, FluxControlNetModel
base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model_union = 'Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0'
controlnet = FluxControlNetModel.from_pretrained(controlnet_model_union, torch_dtype=torch.bfloat16)
pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=controlnet, torch_dtype=torch.bfloat16)
pipe.to("cuda")
# replace with other conds
control_image = load_image("./conds/canny.png")
width, height = control_image.size
prompt = "A young girl stands gracefully at the edge of a serene beach, her long, flowing hair gently tousled by the sea breeze. She wears a soft, pastel-colored dress that complements the tranquil blues and greens of the coastal scenery. The golden hues of the setting sun cast a warm glow on her face, highlighting her serene expression. The background features a vast, azure ocean with gentle waves lapping at the shore, surrounded by distant cliffs and a clear, cloudless sky. The composition emphasizes the girl's serene presence amidst the natural beauty, with a balanced blend of warm and cool tones."
image = pipe(
prompt,
control_image=control_image,
width=width,
height=height,
controlnet_conditioning_scale=0.7,
control_guidance_end=0.8,
num_inference_steps=30,
guidance_scale=3.5,
generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]
```
# Multi-Inference
```python
import torch
from diffusers.utils import load_image
# https://github.com/huggingface/diffusers/pull/11350
# You can directly import from diffusers by install the laster version from source
# from diffusers import FluxControlNetPipeline, FluxControlNetModel
# use local files for this moment
from pipeline_flux_controlnet import FluxControlNetPipeline
from controlnet_flux import FluxControlNetModel
base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model_union = 'Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0'
controlnet = FluxControlNetModel.from_pretrained(controlnet_model_union, torch_dtype=torch.bfloat16)
pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=[controlnet], torch_dtype=torch.bfloat16) # use [] to enable multi-CNs
pipe.to("cuda")
# replace with other conds
control_image = load_image("./conds/canny.png")
width, height = control_image.size
prompt = "A young girl stands gracefully at the edge of a serene beach, her long, flowing hair gently tousled by the sea breeze. She wears a soft, pastel-colored dress that complements the tranquil blues and greens of the coastal scenery. The golden hues of the setting sun cast a warm glow on her face, highlighting her serene expression. The background features a vast, azure ocean with gentle waves lapping at the shore, surrounded by distant cliffs and a clear, cloudless sky. The composition emphasizes the girl's serene presence amidst the natural beauty, with a balanced blend of warm and cool tones."
image = pipe(
prompt,
control_image=[control_image, control_image], # try with different conds such as canny&depth, pose&depth
width=width,
height=height,
controlnet_conditioning_scale=[0.35, 0.35],
control_guidance_end=[0.8, 0.8],
num_inference_steps=30,
guidance_scale=3.5,
generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]
```
# Recommended Parameters
You can adjust controlnet_conditioning_scale and control_guidance_end for stronger control and better detail preservation. For better stability, we highly suggest to use detailed prompt, for some cases, multi-conditions help.
- Canny: use cv2.Canny, controlnet_conditioning_scale=0.7, control_guidance_end=0.8.
- Soft Edge: use [AnylineDetector](https://github.com/huggingface/controlnet_aux), controlnet_conditioning_scale=0.7, control_guidance_end=0.8.
- Depth: use [depth-anything](https://github.com/DepthAnything/Depth-Anything-V2), controlnet_conditioning_scale=0.8, control_guidance_end=0.8.
- Pose: use [DWPose](https://github.com/IDEA-Research/DWPose/tree/onnx), controlnet_conditioning_scale=0.9, control_guidance_end=0.65.
- Gray: use cv2.cvtColor, controlnet_conditioning_scale=0.9, control_guidance_end=0.8.
# Resources
- [InstantX/FLUX.1-dev-IP-Adapter](https://huggingface.co/InstantX/FLUX.1-dev-IP-Adapter)
- [InstantX/FLUX.1-dev-Controlnet-Canny](https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Canny)
- [Shakker-Labs/FLUX.1-dev-ControlNet-Depth](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Depth)
- [Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro)
# Acknowledgements
This model is developed by [Shakker Labs](https://huggingface.co/Shakker-Labs). The original idea is inspired by [xinsir/controlnet-union-sdxl-1.0](https://huggingface.co/xinsir/controlnet-union-sdxl-1.0). All copyright reserved.
# Citation
If you find this project useful in your research, please cite us via
```
@misc{flux-cn-union-pro-2,
author = {Shakker-Labs},
title = {https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0},
year = {2025},
}
}
``` |