English

Infinity ∞: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

demo platform  arXiv  arXiv  huggingface weights  code 

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

πŸ“– Introduction

We present Infinity, a Bitwise Visual AutoRegressive Modeling capable of generating high-resolution and photorealistic images. Infinity redefines visual autoregressive model under a bitwise token prediction framework with an infinite-vocabulary tokenizer & classifier and bitwise self-correction. Theoretically scaling the tokenizer vocabulary size to infinity and concurrently scaling the transformer size, our method significantly unleashes powerful scaling capabilities. Infinity sets a new record for autoregressive text-to-image models, outperforming top-tier diffusion models like SD3-Medium and SDXL. Notably, Infinity surpasses SD3-Medium by improving the GenEval benchmark score from 0.62 to 0.73 and the ImageReward benchmark score from 0.87 to 0.96, achieving a win rate of 66%. Without extra optimization, Infinity generates a high-quality 1024Γ—1024 image in 0.8 seconds, making it 2.6Γ— faster than SD3-Medium and establishing it as the fastest text-to-image model.

πŸ“Œ Note

This repo is used for hosting Infinity's checkpoints. For more details, please refer to code 

πŸ“– Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using:

@misc{han2024infinityscalingbitwiseautoregressive,
    title={Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis}, 
    author={Jian Han and Jinlai Liu and Yi Jiang and Bin Yan and Yuqi Zhang and Zehuan Yuan and Bingyue Peng and Xiaobing Liu},
    year={2024},
    eprint={2412.04431},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2412.04431}, 
}
Downloads last month
7
Inference API
Unable to determine this model's library. Check the docs .