BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper • 2602.14041 • Published 14 days ago • 50
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published 20 days ago • 49
Autoregressive Image Generation with Masked Bit Modeling Paper • 2602.09024 • Published 19 days ago • 6
Autoregressive Image Generation with Masked Bit Modeling Paper • 2602.09024 • Published 19 days ago • 6
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss Paper • 2602.02493 • Published 26 days ago • 42
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published Dec 30, 2025 • 51
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published Dec 18, 2025 • 87
Towards Scalable Pre-training of Visual Tokenizers for Generation Paper • 2512.13687 • Published Dec 15, 2025 • 106
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images Paper • 2511.22805 • Published Nov 27, 2025 • 4
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 239
REASONEDIT: Towards Reasoning-Enhanced Image Editing Models Paper • 2511.22625 • Published Nov 27, 2025 • 47
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30, 2025 • 127
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30, 2025 • 117