Action Images: End-to-End Policy Learning via Multiview Video Generation Paper • 2604.06168 • Published 3 days ago • 9
EgoSim: Egocentric World Simulator for Embodied Interaction Generation Paper • 2604.01001 • Published 9 days ago • 36
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization Paper • 2604.02268 • Published 8 days ago • 91
CutClaw: Agentic Hours-Long Video Editing via Music Synchronization Paper • 2603.29664 • Published 9 days ago • 47
LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 11 days ago • 137
VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward Paper • 2603.26599 • Published 13 days ago • 59
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published 14 days ago • 154
Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published 28 days ago • 21
UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation Paper • 2603.23500 • Published 16 days ago • 35
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 17 days ago • 121
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models Paper • 2603.17051 • Published 23 days ago • 108
MosaicMem: Hybrid Spatial Memory for Controllable Video World Models Paper • 2603.17117 • Published 23 days ago • 87
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published 23 days ago • 136
ESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models Paper • 2603.13033 • Published 27 days ago • 13
GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent Paper • 2603.13875 • Published 26 days ago • 34
Learning Latent Proxies for Controllable Single-Image Relighting Paper • 2603.15555 • Published 24 days ago • 8