arxiv:2310.01407

Conditional Diffusion Distillation

Published on Oct 2, 2023

· Submitted by

akhaliq on Oct 3, 2023

#3 Paper of the day

Upvote

Authors:

Kangfu Mei ,

Mauricio Delbracio ,

Zhengzhong Tu ,

Abstract

Generative diffusion models provide strong priors for text-to-image generation and thereby serve as a foundation for conditional generation tasks such as image editing, restoration, and super-resolution. However, one major limitation of diffusion models is their slow sampling time. To address this challenge, we present a novel conditional distillation method designed to supplement the diffusion priors with the help of image conditions, allowing for conditional sampling with very few steps. We directly distill the unconditional pre-training in a single stage through joint-learning, largely simplifying the previous two-stage procedures that involve both distillation and conditional finetuning separately. Furthermore, our method enables a new parameter-efficient distillation mechanism that distills each task with only a small number of additional parameters combined with the shared frozen unconditional backbone. Experiments across multiple tasks including super-resolution, image editing, and depth-to-image generation demonstrate that our method outperforms existing distillation techniques for the same sampling time. Notably, our method is the first distillation strategy that can match the performance of the much slower fine-tuned conditional diffusion models.

View arXiv page View PDF Add to collection

Community

osanseviero

Oct 3, 2023

Here is an ML-generated summary

Objective

The paper presents a novel conditional distillation method to distill an unconditional diffusion model into a conditional one for faster sampling while maintaining high image quality.

Insights

A new single-stage distillation approach can distill unconditional diffusion models into conditional ones, simplifying previous two-stage procedures.
Jointly optimizing for noise prediction consistency and conditional signal prediction enables replicating diffusion priors with very few sampling steps.
The proposed PREv predictor for z_hat uses original noise and improves over DDIM sampling.
Conditional guidance loss dx is important to avoid bad local minima during distillation.
The method enables parameter-efficient distillation by freezing most parameters and only training task-specific adapters.
The distilled model matches performance of much slower fine-tuned conditional diffusion models.

Results
The proposed conditional diffusion distillation method achieves state-of-the-art image quality with 4 sampling steps, outperforming previous distillation techniques and matching fine-tuned conditional diffusion models that require 50x more sampling steps.