arxiv:2504.07961

Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction

Published on Apr 10

· Submitted by

jzr99 on Apr 11

Upvote

Authors:

Zeren Jiang ,

Abstract

We introduce Geo4D, a method to repurpose video diffusion models for monocular 3D reconstruction of dynamic scenes. By leveraging the strong dynamic prior captured by such video models, Geo4D can be trained using only synthetic data while generalizing well to real data in a zero-shot manner. Geo4D predicts several complementary geometric modalities, namely point, depth, and ray maps. It uses a new multi-modal alignment algorithm to align and fuse these modalities, as well as multiple sliding windows, at inference time, thus obtaining robust and accurate 4D reconstruction of long videos. Extensive experiments across multiple benchmarks show that Geo4D significantly surpasses state-of-the-art video depth estimation methods, including recent methods such as MonST3R, which are also designed to handle dynamic scenes.

View arXiv page View PDF Project page GitHub repository Add to collection

Community

jzr99

Paper author Paper submitter 2 days ago

Project page: https://geo4d.github.io/

[Paper] [Project page] [Github repo]

Geo4D repurposes a video diffusion model for monocular 4D reconstruction.

The first video diffusion-based method that leverages viewpoint-invarient point maps for 4D scene reconstruction.
Predicting partially redundant geometric modalities and fusing them improves 4D prediction accuracy.
Achieved SOTA performance on video depth estimation and comparable performance on camera pose estimation.

librarian-bot

1 day ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2504.07961 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2504.07961 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2504.07961 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.