UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper • 2506.03147 • Published 4 days ago • 57
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Paper • 2506.01713 • Published 5 days ago • 31
UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation Paper • 2505.24521 • Published 9 days ago • 15
ViStoryBench: Comprehensive Benchmark Suite for Story Visualization Paper • 2505.24862 • Published 8 days ago • 31
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model Paper • 2505.23606 • Published 9 days ago • 14
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation Paper • 2505.20292 • Published 12 days ago • 52
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models Paper • 2505.16707 • Published 16 days ago • 42
Flow-GRPO: Training Flow Matching Models via Online RL Paper • 2505.05470 • Published about 1 month ago • 78
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper • 2505.00703 • Published May 1 • 42
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published Apr 24 • 88
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published Apr 17 • 50
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation Paper • 2503.22194 • Published Mar 28 • 24
LEGION: Learning to Ground and Explain for Synthetic Image Detection Paper • 2503.15264 • Published Mar 19 • 21
CoRe^2: Collect, Reflect and Refine to Generate Better and Faster Paper • 2503.09662 • Published Mar 12 • 34
MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice Paper • 2503.05978 • Published Mar 7 • 35