File size: 9,189 Bytes
44c3fad
 
 
 
 
 
 
 
 
d5b05da
1582296
 
44c3fad
 
 
 
 
 
8cf6344
ba41584
c3926da
 
81accec
8cf6344
 
 
44c3fad
 
8cf6344
44c3fad
8cf6344
f3a39d4
ba41584
 
 
 
 
 
 
 
 
 
 
f3a39d4
 
 
44c3fad
8cf6344
ce07205
44c3fad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22ae811
44c3fad
22ae811
 
44c3fad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22ae811
44c3fad
22ae811
 
44c3fad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f3a39d4
 
 
 
 
 
 
 
 
74ab735
f3a39d4
 
 
 
 
 
 
 
 
 
 
ce07205
44c3fad
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
---
license: other
license_name: flux-1-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
language:
- en
library_name: diffusers
pipeline_tag: text-to-image
tags:
- quantized
- fp8
- e4m3
- Text-to-Image
- ControlNet
- Diffusers
- Flux.1-dev
- image-generation
- Stable Diffusion
- quantization
- reduced-precision
base_model:
- Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0
base_model_relation: quantized
inference:
  parameters:
    torch_dtype: torch.float8_e4m3fn
---

# FLUX.1-dev-ControlNet-Union-Pro-2.0 (FP8 Quantized)

This repository contains an FP8 quantized version of the [Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0) model. **This is NOT a fine-tuned model** but a direct quantization of the original BFloat16 model to FP8 format for optimized inference performance. We provide an [online demo](https://huggingface.co/spaces/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0).

# Quantization Details
This model has been quantized from the original BFloat16 format to FP8 format using PyTorch's native FP8 support. Here are the specifics:

- **Quantization Technique**: Native FP8 quantization
- **Precision**: E4M3 format (4 bits for exponent, 3 bits for mantissa)
- **Library Used**: PyTorch's built-in FP8 support
- **Data Type**: `torch.float8_e4m3fn`
- **Original Model**: BFloat16 format (Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0)
- **Model Size Reduction**: ~50% smaller than the original model

The benefits of FP8 quantization include:
- **Reduced Memory Usage**: Approximately 50% smaller model size compared to BFloat16/FP16
- **Faster Inference**: Potential speed improvements, especially on hardware with FP8 support
- **Minimal Quality Loss**: Carefully calibrated quantization process to preserve output quality

**Important Note**: This is a direct quantization of [Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0) and preserves all the functionality of the original model. No fine-tuning or additional training has been performed.

# Keynotes
In comparison with [Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro),
- Remove mode embedding, has smaller model size.
- Improve on canny and pose, better control and aesthetics.
- Add support for soft edge. Remove support for tile.

# Model Cards
- This ControlNet consists of 6 double blocks and 0 single block. Mode embedding is removed.
- We train the model from scratch for 300k steps using a dataset of 20M high-quality general and human images. We train at 512x512 resolution in BFloat16, batch size = 128, learning rate = 2e-5, the guidance is uniformly sampled from [1, 7]. We set the text drop ratio to 0.20.
- This model supports multiple control modes, including canny, soft edge, depth, pose, gray. You can use it just as a normal ControlNet.
- This model can be jointly used with other ControlNets.

# Showcases

<table>
  <tr>
    <td><img src="./images/canny.png" alt="canny" style="height:100%"></td>
  </tr>
  <tr>
    <td><img src="./images/softedge.png" alt="softedge" style="height:100%"></td>
  </tr>
  <tr>
    <td><img src="./images/pose.png" alt="pose" style="height:100%"></td>
  </tr>
  <tr>
    <td><img src="./images/depth.png" alt="depth" style="height:100%"></td>
  </tr>
  <tr>
    <td><img src="./images/gray.png" alt="gray" style="height:100%"></td>
  </tr>
</table>
 
# Inference
```python
import torch
from diffusers.utils import load_image
from diffusers import FluxControlNetPipeline, FluxControlNetModel

base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model_union_fp8 = 'ABDALLALSWAITI/FLUX.1-dev-ControlNet-Union-Pro-2.0-fp8'

# Load using FP8 data type
controlnet = FluxControlNetModel.from_pretrained(controlnet_model_union_fp8, torch_dtype=torch.float8_e4m3fn)
pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=controlnet, torch_dtype=torch.bfloat16)
pipe.to("cuda")

# replace with other conds
control_image = load_image("./conds/canny.png")
width, height = control_image.size

prompt = "A young girl stands gracefully at the edge of a serene beach, her long, flowing hair gently tousled by the sea breeze. She wears a soft, pastel-colored dress that complements the tranquil blues and greens of the coastal scenery. The golden hues of the setting sun cast a warm glow on her face, highlighting her serene expression. The background features a vast, azure ocean with gentle waves lapping at the shore, surrounded by distant cliffs and a clear, cloudless sky. The composition emphasizes the girl's serene presence amidst the natural beauty, with a balanced blend of warm and cool tones."

image = pipe(
    prompt, 
    control_image=control_image,
    width=width,
    height=height,
    controlnet_conditioning_scale=0.7,
    control_guidance_end=0.8,
    num_inference_steps=30, 
    guidance_scale=3.5,
    generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]
```

# Multi-Inference
```python
import torch
from diffusers.utils import load_image

# use local files for this moment
from pipeline_flux_controlnet import FluxControlNetPipeline
from controlnet_flux import FluxControlNetModel

base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model_union_fp8 = 'ABDALLALSWAITI/FLUX.1-dev-ControlNet-Union-Pro-2.0-fp8'

# Load using FP8 data type
controlnet = FluxControlNetModel.from_pretrained(controlnet_model_union_fp8, torch_dtype=torch.float8_e4m3fn)
pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=[controlnet], torch_dtype=torch.bfloat16) # use [] to enable multi-CNs
pipe.to("cuda")

# replace with other conds
control_image = load_image("./conds/canny.png")
width, height = control_image.size

prompt = "A young girl stands gracefully at the edge of a serene beach, her long, flowing hair gently tousled by the sea breeze. She wears a soft, pastel-colored dress that complements the tranquil blues and greens of the coastal scenery. The golden hues of the setting sun cast a warm glow on her face, highlighting her serene expression. The background features a vast, azure ocean with gentle waves lapping at the shore, surrounded by distant cliffs and a clear, cloudless sky. The composition emphasizes the girl's serene presence amidst the natural beauty, with a balanced blend of warm and cool tones."

image = pipe(
    prompt, 
    control_image=[control_image, control_image], # try with different conds such as canny&depth, pose&depth
    width=width,
    height=height,
    controlnet_conditioning_scale=[0.35, 0.35],
    control_guidance_end=[0.8, 0.8],
    num_inference_steps=30, 
    guidance_scale=3.5,
    generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]
```

# Recommended Parameters
You can adjust controlnet_conditioning_scale and control_guidance_end for stronger control and better detail preservation. For better stability, we highly suggest to use detailed prompt, for some cases, multi-conditions help.
- Canny: use cv2.Canny, controlnet_conditioning_scale=0.7, control_guidance_end=0.8.
- Soft Edge: use [AnylineDetector](https://github.com/huggingface/controlnet_aux), controlnet_conditioning_scale=0.7, control_guidance_end=0.8.
- Depth: use [depth-anything](https://github.com/DepthAnything/Depth-Anything-V2), controlnet_conditioning_scale=0.8, control_guidance_end=0.8.
- Pose: use [DWPose](https://github.com/IDEA-Research/DWPose/tree/onnx), controlnet_conditioning_scale=0.9, control_guidance_end=0.65.
- Gray: use cv2.cvtColor, controlnet_conditioning_scale=0.9, control_guidance_end=0.8.

# Using FP8 Model
This repository includes the FP8 quantized version of the model. To use it, you'll need PyTorch with FP8 support:

```python
import torch
from diffusers.utils import load_image
from diffusers import FluxControlNetPipeline, FluxControlNetModel

base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model_union_fp8 = 'ABDALLALSWAITI/FLUX.1-dev-ControlNet-Union-Pro-2.0-fp8'

# Load using FP8 data type
controlnet = FluxControlNetModel.from_pretrained(controlnet_model_union_fp8, torch_dtype=torch.float8_e4m3fn)
pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=controlnet, torch_dtype=torch.bfloat16)
pipe.to("cuda")

# The rest of the code is the same as with the original model
```

See `fp8_inference_example.py` for a complete example.


# Resources
- [InstantX/FLUX.1-dev-IP-Adapter](https://huggingface.co/InstantX/FLUX.1-dev-IP-Adapter)
- [InstantX/FLUX.1-dev-Controlnet-Canny](https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Canny)
- [Shakker-Labs/FLUX.1-dev-ControlNet-Depth](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Depth)
- [Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro)

# Acknowledgements
This model is developed by [Shakker Labs](https://huggingface.co/Shakker-Labs). The original idea is inspired by [xinsir/controlnet-union-sdxl-1.0](https://huggingface.co/xinsir/controlnet-union-sdxl-1.0). All copyright reserved.