On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published 5 days ago • 71
Flow-GRPO: Training Flow Matching Models via Online RL Paper • 2505.05470 • Published 4 days ago • 62
Hierarchical Vision-Language Alignment for Text-to-Image Generation via Diffusion Models Paper • 2501.00917 • Published Jan 1 • 1
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published Feb 12 • 36
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published 13 days ago • 26
Improving Editability in Image Generation with Layer-wise Memory Paper • 2505.01079 • Published 11 days ago • 27
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published 18 days ago • 88
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper • 2504.08736 • Published Apr 11 • 47