DFloat11 Compressed Model: lodestones/Chroma
This is a DFloat11 losslessly compressed version of the original lodestones/Chroma
(v39) model. It reduces model size by 32% compared to the original BFloat16 model, while maintaining bit-identical outputs and supporting efficient GPU inference.
π₯π₯π₯ Thanks to DFloat11 compression, Chroma can now run smoothly on a single 16GB GPU without any quality loss. π₯π₯π₯
π Performance Comparison
Metric | Chroma (BFloat16) | Chroma (DFloat11) |
---|---|---|
Model Size | 17.80 GB | 12.16 GB |
Peak GPU Memory (1024Γ1024 image generation) |
18.33 GB | 13.26 GB |
Generation Time (A100 GPU) |
56 seconds | 59 seconds |
π§ How to Use
Install or upgrade the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed):
pip install -U dfloat11[cuda12] # or if you have CUDA version 11: # pip install -U dfloat11[cuda11]
Install or upgrade the diffusers library.
pip install -U diffusers
To use the DFloat11 model, run the following example code in Python:
import torch from diffusers import ChromaTransformer2DModel, ChromaPipeline from transformers.modeling_utils import no_init_weights from dfloat11 import DFloat11Model with no_init_weights(): transformer = ChromaTransformer2DModel().to(torch.bfloat16) DFloat11Model.from_pretrained( "DFloat11/Chroma-DF11", bfloat16_model=transformer, device="cpu", ) pipe = ChromaPipeline.from_pretrained("lodestones/Chroma", transformer=transformer, torch_dtype=torch.bfloat16) pipe.enable_model_cpu_offload() prompt = [ "A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done." ] negative_prompt = ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"] image = pipe( prompt=prompt, negative_prompt=negative_prompt, generator=torch.Generator("cpu").manual_seed(433), num_inference_steps=40, guidance_scale=3.0, ).images[0] image.save("chroma-output.png")
π How It Works
We apply Huffman coding to losslessly compress the exponent bits of BFloat16 model weights, which are highly compressible (their 8 bits carry only ~2.6 bits of actual information). To enable fast inference, we implement a highly efficient CUDA kernel that performs on-the-fly weight decompression directly on the GPU.
The result is a model that is ~32% smaller, delivers bit-identical outputs, and achieves performance comparable to the original BFloat16 model.
Learn more in our research paper.
π Learn More
- Downloads last month
- 0
Model tree for DFloat11/Chroma-DF11
Base model
lodestones/Chroma