DFloat11 Compressed Model: OmniGen2/OmniGen2 MLLM

This is a DFloat11 losslessly compressed version of the original OmniGen2/OmniGen2 model. It reduces model size by 32% compared to the original BFloat16 model, while maintaining bit-identical outputs and supporting efficient GPU inference.

πŸ“Š Performance Comparison

Metric OmniGen2 (BFloat16) OmniGen2 (DFloat11)
Model Size 16.23 GB 11.11 GB
Peak GPU Memory
(1024Γ—1024 image generation)
18.41 GB 14.36 GB
Generation Time
(A100 GPU)
25 seconds 27 seconds

πŸ”§ How to Use

A complete usage guide is available in our GitHub repository (forked from the official OmniGen2 repository).

πŸ‘‰ https://github.com/LeanModels/OmniGen2-DFloat11 πŸ‘ˆ

πŸ” How It Works

We apply Huffman coding to losslessly compress the exponent bits of BFloat16 model weights, which are highly compressible (their 8 bits carry only ~2.6 bits of actual information). To enable fast inference, we implement a highly efficient CUDA kernel that performs on-the-fly weight decompression directly on the GPU.

The result is a model that is ~32% smaller, delivers bit-identical outputs, and achieves performance comparable to the original BFloat16 model.

Learn more in our research paper.

πŸ“„ Learn More

Downloads last month
0
Safetensors
Model size
1.51M params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DFloat11/OmniGen2-mllm-DF11

Base model

OmniGen2/OmniGen2
Quantized
(2)
this model