GMem: Generative Modeling with Explicit Memory

Yi Tang :man_student:, Peng Sun :man_artist:, Zhenglin Cheng :man_student:, Tao Lin :skier:

[arXiv] :page_facing_up: | [BibTeX] :label:

Abstract

Recent studies indicate that the denoising process in deep generative diffusion models implicitly learns and memorizes semantic information from the data distribution. These findings suggest that capturing more complex data distributions requires larger neural networks, leading to a substantial increase in computational demands, which in turn become the primary bottleneck in both training and inference of diffusion models. To this end, we introduce GMem: A Modular Approach for Ultra-Efficient Generative Models. Our approach decouples the memory capacity from model and implements it as a separate, immutable memory set that preserves the essential semantic information in the data. The results are significant: our GMem enhances both training, sampling efficiency, and diversity generation. This design on one hand reduces the reliance on network for memorize complex data distribution and thus enhancing both training and sampling efficiency. For instance, on ImageNet at $256\times256$ resolution, GMem accelerates SiT training by over $50\times$, achieving the performance of a SiT model trained for $7M$ steps in fewer than $140K$ steps. Compared to the most efficient existing method, REPA, GMem still offers a $25\times$ speedup, attaining an FID score of 4.86 within $160K$ steps, whereas REPA requires over $4M$ steps. Additionally, the memory bank supports training-free adaptation to new images not present in the training set.

GMem Checkpoints

We offer the following pre-trained model and memory bank here:

File	Backbone	Training Epoch	Dataset	Bank Size	Download
GMem_XL_160Epoch_in1k256_network.pt	FasterDiT-XL/1	160	ImageNet $256\times 256$	-	Huggingface
GMem_XL_600Epoch_in1k256_network.pt	FasterDiT-XL/1	160	ImageNet $256\times 256$	-	Huggingface
Memory_bank_in1k256.pth	-	-	ImageNet $256\times 256$	1.28M	Huggingface