LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning Paper • 2505.16933 • Published 2 days ago • 23
Scaling Diffusion Transformers Efficiently via $μ$P Paper • 2505.15270 • Published 4 days ago • 24 • 2
The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing Paper • 2311.01410 • Published Nov 2, 2023
Revisiting Discriminative vs. Generative Classifiers: Theory and Implications Paper • 2302.02334 • Published Feb 5, 2023
On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability Paper • 2405.16845 • Published May 27, 2024