DFloat11 Compressed Model: lodestones/Chroma

This is a DFloat11 losslessly compressed version of the original lodestones/Chroma (v39) model. It reduces model size by 32% compared to the original BFloat16 model, while maintaining bit-identical outputs and supporting efficient GPU inference.

πŸ”₯πŸ”₯πŸ”₯ Thanks to DFloat11 compression, Chroma can now run smoothly on a single 16GB GPU without any quality loss. πŸ”₯πŸ”₯πŸ”₯

πŸ“Š Performance Comparison

Metric Chroma (BFloat16) Chroma (DFloat11)
Model Size 17.80 GB 12.16 GB
Peak GPU Memory
(1024Γ—1024 image generation)
18.33 GB 13.26 GB
Generation Time
(A100 GPU)
56 seconds 59 seconds

πŸ”§ How to Use

  1. Install or upgrade the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed):

    pip install -U dfloat11[cuda12]
    # or if you have CUDA version 11:
    # pip install -U dfloat11[cuda11]
    
  2. Install or upgrade the diffusers library.

    pip install -U diffusers
    
  3. To use the DFloat11 model, run the following example code in Python:

    import torch
    from diffusers import ChromaTransformer2DModel, ChromaPipeline
    from transformers.modeling_utils import no_init_weights
    from dfloat11 import DFloat11Model
    
    with no_init_weights():
        transformer = ChromaTransformer2DModel().to(torch.bfloat16)
    
    DFloat11Model.from_pretrained(
        "DFloat11/Chroma-DF11",
        bfloat16_model=transformer,
        device="cpu",
    )
    
    pipe = ChromaPipeline.from_pretrained("lodestones/Chroma", transformer=transformer, torch_dtype=torch.bfloat16)
    pipe.enable_model_cpu_offload()
    
    prompt = [
        "A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
    ]
    negative_prompt =  ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"]
    
    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        generator=torch.Generator("cpu").manual_seed(433),
        num_inference_steps=40,
        guidance_scale=3.0,
    ).images[0]
    
    image.save("chroma-output.png")
    

πŸ” How It Works

We apply Huffman coding to losslessly compress the exponent bits of BFloat16 model weights, which are highly compressible (their 8 bits carry only ~2.6 bits of actual information). To enable fast inference, we implement a highly efficient CUDA kernel that performs on-the-fly weight decompression directly on the GPU.

The result is a model that is ~32% smaller, delivers bit-identical outputs, and achieves performance comparable to the original BFloat16 model.

Learn more in our research paper.

πŸ“„ Learn More

Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DFloat11/Chroma-DF11

Base model

lodestones/Chroma
Quantized
(3)
this model