DFloat11 Compressed Model: `black-forest-labs/FLUX.1-Kontext-dev`

This is a DFloat11 losslessly compressed version of the original black-forest-labs/FLUX.1-Kontext-dev model. It reduces model size by 32% compared to the original BFloat16 model, while maintaining bit-identical outputs and supporting efficient GPU inference.

🔥🔥🔥 Thanks to DFloat11 compression, FLUX.1-Kontext-dev can now run smoothly on a single 24GB GPU without any quality loss. 🔥🔥🔥

📊 Performance Comparison

Metric	FLUX.1-Kontext-dev (BFloat16)	FLUX.1-Kontext-dev (DFloat11)
Model Size	23.80 GB	16.33 GB
Peak GPU Memory (1024×1024 image generation)	24.86 GB	18.12 GB
Generation Time (A100 GPU)	72 seconds	83 seconds

🔧 How to Use

Install or upgrade the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed):
```
pip install -U dfloat11[cuda12]
# or if you have CUDA version 11:
# pip install -U dfloat11[cuda11]
```
Install diffusers from the main branch until future stable release.
```
pip install git+https://github.com/huggingface/diffusers.git
```

To use the DFloat11 model, run the following example code in Python:

import torch
from diffusers import FluxKontextPipeline
from diffusers.utils import load_image
from dfloat11 import DFloat11Model

pipe = FluxKontextPipeline.from_pretrained("black-forest-labs/FLUX.1-Kontext-dev", torch_dtype=torch.bfloat16)
DFloat11Model.from_pretrained(
    "DFloat11/FLUX.1-Kontext-dev-DF11",
    device="cpu",
    bfloat16_model=pipe.transformer,
)
pipe.enable_model_cpu_offload()

input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")

image = pipe(
    image=input_image,
    prompt="Add a hat to the cat",
    guidance_scale=2.5,
).images[0]

image.save("kontext.png")

🔍 How It Works

We apply Huffman coding to losslessly compress the exponent bits of BFloat16 model weights, which are highly compressible (their 8 bits carry only ~2.6 bits of actual information). To enable fast inference, we implement a highly efficient CUDA kernel that performs on-the-fly weight decompression directly on the GPU.

The result is a model that is ~32% smaller, delivers bit-identical outputs, and achieves performance comparable to the original BFloat16 model.

Learn more in our research paper.

📄 Learn More

Paper: 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
GitHub: https://github.com/LeanModels/DFloat11
HuggingFace: https://huggingface.co/DFloat11

DFloat11
/

FLUX.1-Kontext-dev-DF11

DFloat11 Compressed Model: `black-forest-labs/FLUX.1-Kontext-dev`

📊 Performance Comparison

🔧 How to Use

🔍 How It Works

📄 Learn More

Model tree for DFloat11/FLUX.1-Kontext-dev-DF11

Space using DFloat11/FLUX.1-Kontext-dev-DF11 1

Collection including DFloat11/FLUX.1-Kontext-dev-DF11

DFloat11 | FLUX.1

DFloat11 Compressed Model: black-forest-labs/FLUX.1-Kontext-dev

📊 Performance Comparison

🔧 How to Use

🔍 How It Works

📄 Learn More

Model tree for DFloat11/FLUX.1-Kontext-dev-DF11

Space using DFloat11/FLUX.1-Kontext-dev-DF11 1

Collection including DFloat11/FLUX.1-Kontext-dev-DF11

DFloat11 Compressed Model: `black-forest-labs/FLUX.1-Kontext-dev`