SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency
Abstract
We present Stable Video 4D (SV4D), a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation. Unlike previous methods that rely on separately trained generative models for video generation and novel view synthesis, we design a unified diffusion model to generate novel view videos of dynamic 3D objects. Specifically, given a monocular reference video, SV4D generates novel views for each video frame that are temporally consistent. We then use the generated novel view videos to optimize an implicit 4D representation (dynamic NeRF) efficiently, without the need for cumbersome SDS-based optimization used in most prior works. To train our unified novel view video generation model, we curated a dynamic 3D object dataset from the existing Objaverse dataset. Extensive experimental results on multiple datasets and user studies demonstrate SV4D's state-of-the-art performance on novel-view video synthesis as well as 4D generation compared to prior works.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- 4Diffusion: Multi-view Video Diffusion Model for 4D Generation (2024)
- Animate3D: Animating Any 3D Model with Multi-view Video Diffusion (2024)
- Vivid-ZOO: Multi-View Video Generation with Diffusion Model (2024)
- 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models (2024)
- Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper