🌿 Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

We introduce LlamaGen, a new family of image generation models that apply original next-token prediction paradigm of large language models to visual generation domain. It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly. We reexamine design spaces of image tokenizers, scalability properties of image generation models, and their training data quality.

This repo is used for hosting LlamaGen's checkpoints. For more details or tutorials see https://github.com/FoundationVision/LlamaGen

Paper:arxiv.org/abs/2406.06525

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FoundationVision/LlamaGen

Finetunes

1 model

Spaces using FoundationVision/LlamaGen 12

Paper for FoundationVision/LlamaGen

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 71