File size: 2,445 Bytes
8af41d7
 
2e05679
03861e5
8af41d7
2e05679
 
 
 
 
 
 
 
16d0288
 
 
 
2e05679
39a7604
 
 
 
 
52bb97d
 
 
39a7604
 
52bb97d
 
 
 
 
 
 
 
39a7604
 
 
2e05679
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
license: mit
pipeline_tag: unconditional-image-generation
library_name: diffusers
---

# Autoregressive Image Generation with Randomized Parallel Decoding

[Haopeng Li](https://github.com/hp-l33)<sup>1</sup>, Jinyue Yang<sup>2</sup>, [Guoqi Li](https://casialiguoqi.github.io)<sup>2,πŸ“§</sup>, [Huan Wang](https://huanwang.tech)<sup>1,πŸ“§</sup>

<sup>1</sup> Westlake University,
<sup>2</sup> Institute of Automation, Chinese Academy of Sciences

## TL;DR
**ARPG** is a novel autoregressive image generation framework capable of performing **BERT-style masked modeling** with a **GPT-style causal architecture**.

``πŸ’ͺ FID 1.94`` ``πŸš€ Fast Speed`` ``♻️ Low Memory Usage`` ``🎲 Radnom Order`` ``πŸ’‘ Zero-shot Inference``

## Usage:
You can easily load it through the Hugging Face DiffusionPipeline and optionally customize various parameters such as the model type, number of steps, and class labels.
```python
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("hp-l33/ARPG", custom_pipeline="hp-l33/ARPG")

class_labels = [207, 360, 388, 113, 355, 980, 323, 979]

generated_image = pipeline(
    model_type="ARPG-XL",       # choose from 'ARPG-L', 'ARPG-XL', or 'ARPG-XXL'
    seed=0,                     # set a seed for reproducibility
    num_steps=64,               # number of autoregressive steps
    class_labels=class_labels,  # provide valid ImageNet class labels
    cfg_scale=4,                # classifier-free guidance scale
    output_dir="./images",      # directory to save generated images
    cfg_schedule="constant",    # choose between 'constant' (suggested) and 'linear'
    sample_schedule="arccos",   # choose between 'arccos' (suggested) and 'cosine'
)

generated_image.show()
```

## Citation
If this work is helpful for your research, please give it a star or cite it:
```bibtex
@article{li2025autoregressive,
    title={Autoregressive Image Generation with Randomized Parallel Decoding},
    author={Haopeng Li and Jinyue Yang and Guoqi Li and Huan Wang},
    journal={arXiv preprint arXiv:2503.10568},
    year={2025}
}
```

## Acknowledgement

Thanks to [LlamaGen](https://github.com/FoundationVision/LlamaGen) for its open-source codebase. Appreciate [RandAR](https://github.com/ziqipang/RandAR) and [RAR](https://github.com/bytedance/1d-tokenizer/blob/main/README_RAR.md) for inspiring this work, and also thank [ControlAR](https://github.com/hustvl/ControlAR).