WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments Paper β’ 2504.03886 β’ Published 6 days ago β’ 3
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Paper β’ 2504.06958 β’ Published 1 day ago β’ 6
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion Paper β’ 2504.04010 β’ Published 6 days ago β’ 7
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting Paper β’ 2504.05541 β’ Published 3 days ago β’ 10
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis Paper β’ 2504.04842 β’ Published 4 days ago β’ 18
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography Paper β’ 2504.07083 β’ Published 1 day ago β’ 18
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper β’ 2504.07096 β’ Published 1 day ago β’ 44
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper β’ 2504.02826 β’ Published 7 days ago β’ 67
SPF-Portrait: Towards Pure Portrait Customization with Semantic Pollution-Free Fine-tuning Paper β’ 2504.00396 β’ Published 10 days ago β’ 4
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration Paper β’ 2504.03536 β’ Published 6 days ago β’ 9
SmolVLM: Redefining small and efficient multimodal models Paper β’ 2504.05299 β’ Published 3 days ago β’ 144