CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
Abstract
We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video. CAT4D leverages a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis at any specified camera poses and timestamps. Combined with a novel sampling approach, this model can transform a single monocular video into a multi-view video, enabling robust 4D reconstruction via optimization of a deformable 3D Gaussian representation. We demonstrate competitive performance on novel view synthesis and dynamic scene reconstruction benchmarks, and highlight the creative capabilities for 4D scene generation from real or generated videos. See our project page for results and interactive demos: cat-4d.github.io.
Community
we have to appreciate that they at least have a paper in this case.
Genie 2 doesn't even have a paper, only a web page
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion (2024)
- GenXD: Generating Any 3D and 4D Scenes (2024)
- Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention (2024)
- Generating 3D-Consistent Videos from Unposed Internet Photos (2024)
- VistaDream: Sampling multiview consistent images for single-view scene reconstruction (2024)
- 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors (2024)
- ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
why google only show videos, can share some code or demo?
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper