Abstract
Recent advances in continuous generative models, including multi-step approaches like diffusion and flow-matching (typically requiring 8-1000 sampling steps) and few-step methods such as consistency models (typically 1-8 steps), have demonstrated impressive generative performance. However, existing work often treats these approaches as distinct paradigms, resulting in separate training and sampling methodologies. We introduce a unified framework for training, sampling, and analyzing these models. Our implementation, the Unified Continuous Generative Models Trainer and Sampler (UCGM-{T,S}), achieves state-of-the-art (SOTA) performance. For example, on ImageNet 256x256 using a 675M diffusion transformer, UCGM-T trains a multi-step model achieving 1.30 FID in 20 steps and a few-step model reaching 1.42 FID in just 2 steps. Additionally, applying UCGM-S to a pre-trained model (previously 1.26 FID at 250 steps) improves performance to 1.06 FID in only 40 steps. Code is available at: https://github.com/LINs-lab/UCGM.
Community
We have introduced a unified framework (UCGM) for training, sampling, and analyzing both multi-step models like diffusion and flow-matching, as well as few-step methods such as consistency models.
Notably, we achieve state-of-the-art (SOTA) performance on ImageNet 256x256 (1.06 FID with 40 sampling steps, 1.42 FID with 2 sampling steps) and ImageNet 512x512 (1.24 FID with 150 sampling steps, 1.75 FID with 2 sampling steps)!
We have introduced a unified framework (UCGM) for training, sampling, and analyzing both multi-step models like diffusion and flow-matching, as well as few-step methods such as consistency models.
Notably, we achieve state-of-the-art (SOTA) performance on ImageNet 256x256 (1.06 FID with 40 sampling steps, 1.42 FID with 2 sampling steps) and ImageNet 512x512 (1.24 FID with 150 sampling steps, 1.75 FID with 2 sampling steps)!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Fast Autoregressive Models for Continuous Latent Generation (2025)
- An Empirical Study of GPT-4o Image Generation Capabilities (2025)
- Wan: Open and Advanced Large-Scale Video Generative Models (2025)
- InstaRevive: One-Step Image Enhancement via Dynamic Score Matching (2025)
- Boosting Generative Image Modeling via Joint Image-Feature Synthesis (2025)
- Physics-aware generative models for turbulent fluid flows through energy-consistent stochastic interpolants (2025)
- Conditional Data Synthesis Augmentation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper