Abstract
SpatialTrackerV2 is a feed-forward 3D point tracking method for monocular videos that integrates point tracking, monocular depth, and camera pose estimation into a unified, end-to-end architecture, achieving high performance and speed.
We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos. Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point tracking, monocular depth, and camera pose estimation into a high-performing and feedforward 3D point tracker. It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion, with a fully differentiable and end-to-end architecture, allowing scalable training across a wide range of datasets, including synthetic sequences, posed RGB-D videos, and unlabeled in-the-wild footage. By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%, and matches the accuracy of leading dynamic 3D reconstruction approaches while running 50times faster.
Community
Project: https://spatialtracker.github.io/
Code&Models: https://github.com/henry123-boy/SpaTrackerV2
Online Demo: https://huggingface.co/spaces/Yuxihenry/SpatialTrackerV2
SpatialTrackerV2 is the first feedforward model for dynamic reconstruction and 3D point tracking.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper