This is the model release of the paper
Elucidating the design space of language models for image generation
You may check the paper: arXiv, code: Github
We provide 4 Binary-Autoencoder (BAE) tokenizers, following Binary Latent Diffusion, with code dimension 16, 10, 24 and 32, each trained for 1,000,000 iterations with batch size 256.
Code Dim | Bernoulli Sampling | Link | Size |
---|---|---|---|
16 | โ | link | 332MB |
16 | โ | link | 332MB |
20 | โ | link | 332MB |
24 | โ | link | 332MB |
The generation model architecture is adapted from Llama2, following LlameGen.
Model tree for xuantonglll/ELM
Unable to build the model tree, the base model loops to the model itself. Learn more.