GMem: Generative Modeling with Explicit Memory

Teaser image

Yi Tang :man_student:, Peng Sun :man_artist:, Zhenglin Cheng :man_student:, Tao Lin :skier:

[arXiv] :page_facing_up: | [BibTeX] :label:


Abstract

Recent studies indicate that the denoising process in deep generative diffusion models implicitly learns and memorizes semantic information from the data distribution. These findings suggest that capturing more complex data distributions requires larger neural networks, leading to a substantial increase in computational demands, which in turn become the primary bottleneck in both training and inference of diffusion models. To this end, we introduce Generative Modeling with Explicit Memory GMem, leveraging an external memory bank in both training and sampling phases of diffusion models. This approach preserves semantic information from data distributions, reducing reliance on neural network capacity for learning and generalizing across diverse datasets. The results are significant: our GMem enhances both training, sampling efficiency, and generation quality. For instance, on ImageNet at $256 \times 256$ resolution, GMem accelerates SiT training by over $46.7\times$, achieving the performance of a SiT model trained for $7 M$ steps in fewer than $150K$ steps. Compared to the most efficient existing method, REPA, GMem still offers a $16\times$ speedup, attaining an FID score of 5.75 within $250K$ steps, whereas REPA requires over $4M$ steps. Additionally, our method achieves state-of-the-art generation quality, with an FID score of 3.56 without classifier-free guidance on ImageNet $256\times256$.


GMem Checkpoints

We offer the following pre-trained model and memory bank here:

File Backbone Training Steps Dataset Bank Size Training Epo. Download
GMem_XL_2Miter_ImageNet-1K_K640000_5epo.pth SiT-XL/2 2M ImageNet $256\times 256$ 640,000 5 Huggingface
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .