arxiv:2307.01952

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Published on Jul 4, 2023

· Submitted by

AK on Jul 6, 2023

#1 Paper of the day

Upvote

Authors:

Dustin Podell ,

Kyle Lacey ,

Andreas Blattmann ,

Tim Dockhorn ,

Jonas Müller ,

Joe Penna ,

Robin Rombach

Abstract

SDXL, a latent diffusion model using a larger UNet with additional text encoders and attention mechanisms, improves text-to-image synthesis significantly.

AI-generated summary

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to code and model weights at https://github.com/Stability-AI/generative-models

View arXiv page View PDF GitHub 27.1k auto Add to collection

Community

mikeerl

Jul 8, 2023

Hey, Im reviewing deep learning papers on twitter daily in Hebrew via hashtag #https://twitter.com/hashtag/shorthebrewpapereviews?src=hashtag_click. So far I've shortly reviewed about deep learning papers. You are invited to follow and comment

This paper review can be found at https://twitter.com/MikeE_3_14/status/1677747429221838848?s=20