Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 107
Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published Dec 16, 2024 • 23
Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging Paper • 2504.08635 • Published Apr 11 • 5
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation Paper • 2504.09454 • Published Apr 13 • 12
Efficient Generative Model Training via Embedded Representation Warmup Paper • 2504.10188 • Published Apr 14 • 12
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published Apr 29 • 32
LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer Paper • 2506.06952 • Published Jun 8 • 10
Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression Paper • 2506.09482 • Published Jun 11 • 46
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets Paper • 2506.14761 • Published Jun 17 • 16
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published Jul 2 • 58
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling Paper • 2507.07955 • Published 24 days ago • 22