--- base_model: black-forest-labs/FLUX.1-dev datasets: TIGER-Lab/OmniEdit-Filtered-1.2M library_name: diffusers license: other inference: true tags: - flux - flux-diffusers - text-to-image - diffusers - control - diffusers-training widget: - text: Give this the look of a traditional Japanese woodblock print. output: url: >- https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_car.jpg - text: transform the setting to a winter scene output: url: >- https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_green_creature.jpg - text: turn the color of mushroom to gray output: url: >- https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_mushroom.jpg - text: Change it to look like it's in the style of an impasto painting. output: url: >- https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_norte_dam.jpg --- # Flux Edit These are the control weights trained on [black-forest-labs/FLUX.1-dev](htpss://hf.co/black-forest-labs/FLUX.1-dev) and [TIGER-Lab/OmniEdit-Filtered-1.2M](https://huggingface.co/datasets/TIGER-Lab/OmniEdit-Filtered-1.2M) for image editing. We use the [Flux Control framework](https://blackforestlabs.ai/flux-1-tools/) for fine-tuning. ## License Please adhere to the licensing terms as described [here](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md) ## Intended uses & limitations ### Inference ```py from diffusers import FluxControlPipeline, FluxTransformer2DModel from diffusers.utils import load_image import torch path = "sayakpaul/FLUX.1-dev-edit-v0" edit_transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16) pipeline = FluxControlPipeline.from_pretrained( "black-forest-labs/FLUX.1-dev", transformer=edit_transformer, torch_dtype=torch.bfloat16 ).to("cuda") url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/assets/mushroom.jpg" image = load_image(url) # resize as needed. print(image.size) prompt = "turn the color of mushroom to gray" image = pipeline( control_image=image, prompt=prompt, guidance_scale=30., # change this as needed. num_inference_steps=50, # change this as needed. max_sequence_length=512, height=image.height, width=image.width, generator=torch.manual_seed(0) ).images[0] image.save("edited_image.png") ``` ### Speeding inference with a turbo LoRA We can speed up the inference by reducing the `num_inference_steps` to produce a nice image by using turbo LoRA like [`ByteDance/Hyper-SD`](https://hf.co/ByteDance/Hyper-SD). Make sure to install `peft` before running the code below: `pip install -U peft`.
Code ```py from diffusers import FluxControlPipeline, FluxTransformer2DModel from diffusers.utils import load_image from huggingface_hub import hf_hub_download import torch path = "sayakpaul/FLUX.1-dev-edit-v0" edit_transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16) pipeline = FluxControlPipeline.from_pretrained( "black-forest-labs/FLUX.1-dev", transformer=edit_transformer, torch_dtype=torch.bfloat16 ).to("cuda") # load the turbo LoRA pipeline.load_lora_weights( hf_hub_download("ByteDance/Hyper-SD", "Hyper-FLUX.1-dev-8steps-lora.safetensors"), adapter_name="hyper-sd" ) pipeline.set_adapters(["hyper-sd"], adapter_weights=[0.125]) url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/assets/mushroom.jpg" image = load_image(url) # resize as needed. print(image.size) prompt = "turn the color of mushroom to gray" image = pipeline( control_image=image, prompt=prompt, guidance_scale=30., # change this as needed. num_inference_steps=8, # change this as needed. max_sequence_length=512, height=image.height, width=image.width, generator=torch.manual_seed(0) ).images[0] image.save("edited_image.png") ```

Comparison
50 steps 8 steps
50 steps 1 8 steps 1
50 steps 2 8 steps 2
50 steps 3 8 steps 3
50 steps 4 8 steps 4
You can also choose to perform quantization if the memory requirements cannot be satisfied further w.r.t your hardware. Refer to the [Diffusers documentation](https://huggingface.co/docs/diffusers/main/en/quantization/overview) to learn more. `guidance_scale` also impacts the results:
Prompt Collage (gs: 10) Collage (gs: 20) Collage (gs: 30) Collage (gs: 40)
Give this the look of a traditional Japanese woodblock print. Edited Image gs 10 Edited Image gs 20 Edited Image gs 30 Edited Image gs 40
transform the setting to a winter scene Edited Image gs 10 Edited Image gs 20 Edited Image gs 30 Edited Image gs 40
turn the color of mushroom to gray Edited Image gs 10 Edited Image gs 20 Edited Image gs 30 Edited Image gs 40
### Limitations and bias Expect the model to perform underwhelmingly as we don't know the exact training details of Flux Control. ## Training details Fine-tuning codebase is [here](https://github.com/sayakpaul/flux-image-editing). Training hyperparameters: * Per GPU batch size: 4 * Gradient accumulation steps: 4 * Guidance scale: 30 * BF16 mixed-precision * AdamW optimizer (8bit from `bitsandbytes`) * Constant learning rate of 5e-5 * Weight decay of 1e-6 * 20000 training steps Training was conducted using a node of 8xH100s. We used a simplified flow mechanism to perform the linear interpolation. In pseudo-code, that looks like: ```py sigmas = torch.rand(batch_size) timesteps = (sigmas * noise_scheduler.config.num_train_timesteps).long() ... noisy_model_input = (1.0 - sigmas) * pixel_latents + sigmas * noise ``` where `pixel_latents` is computed from the source images and `noise` is drawn from a Gaussian distribution. For more details, [check out the repository](https://github.com/sayakpaul/flux-image-editing/blob/b041f62df8f959dc3b2f324d2bfdcdf3a6388598/train.py#L403).