SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers Paper • 2506.00830 • Published 7 days ago • 5
DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation Paper • 2506.03123 • Published 4 days ago • 14
Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes Paper • 2506.00227 • Published 8 days ago • 11
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation Paper • 2506.01144 • Published 6 days ago • 14
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics Paper • 2506.00070 • Published 9 days ago • 26
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces Paper • 2506.00123 • Published 8 days ago • 32
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Paper • 2506.01943 • Published 5 days ago • 24
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models Paper • 2506.03135 • Published 4 days ago • 36
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper • 2506.03147 • Published 4 days ago • 57
EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance Paper • 2505.21876 • Published 11 days ago • 9
FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control Paper • 2505.22642 • Published 10 days ago • 3
Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles Paper • 2505.21060 • Published 12 days ago • 4
Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment Paper • 2505.18600 • Published 15 days ago • 45
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO Paper • 2505.21457 • Published 11 days ago • 14
Cosmos-Reason1 Collection Multimodal world understanding through reasoning • 5 items • Updated 1 day ago • 26
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published 19 days ago • 51
Emerging Properties in Unified Multimodal Pretraining Paper • 2505.14683 • Published 18 days ago • 129
Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation Paper • 2505.13215 • Published 20 days ago • 28