G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration
Abstract
G-CUT3R enhances 3D scene reconstruction by integrating auxiliary data through dedicated encoders and zero convolution, improving performance across benchmarks.
We introduce G-CUT3R, a novel feed-forward approach for guided 3D scene reconstruction that enhances the CUT3R model by integrating prior information. Unlike existing feed-forward methods that rely solely on input images, our method leverages auxiliary data, such as depth, camera calibrations, or camera positions, commonly available in real-world scenarios. We propose a lightweight modification to CUT3R, incorporating a dedicated encoder for each modality to extract features, which are fused with RGB image tokens via zero convolution. This flexible design enables seamless integration of any combination of prior information during inference. Evaluated across multiple benchmarks, including 3D reconstruction and other multi-view tasks, our approach demonstrates significant performance improvements, showing its ability to effectively utilize available priors while maintaining compatibility with varying input modalities.
Community
G-CUT3R is a fast, feed-forward 3D reconstruction model that fuses RGB with real-world priors (calibrations, poses, depth), enabling flexible, efficient, and state-of-the-art results while requiring less training data.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper