arxiv:2502.19455

FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion model

Published on Feb 26

Authors:

Abstract

FLAP integrates explicit 3D parameters into diffusion models to enhance controllable and natural video generation from audio for realistic avatars.

AI-generated summary

Diffusion-based video generation techniques have significantly improved zero-shot talking-head avatar generation, enhancing the naturalness of both head motion and facial expressions. However, existing methods suffer from poor controllability, making them less applicable to real-world scenarios such as filmmaking and live streaming for e-commerce. To address this limitation, we propose FLAP, a novel approach that integrates explicit 3D intermediate parameters (head poses and facial expressions) into the diffusion model for end-to-end generation of realistic portrait videos. The proposed architecture allows the model to generate vivid portrait videos from audio while simultaneously incorporating additional control signals, such as head rotation angles and eye-blinking frequency. Furthermore, the decoupling of head pose and facial expression allows for independent control of each, offering precise manipulation of both the avatar's pose and facial expressions. We also demonstrate its flexibility in integrating with existing 3D head generation methods, bridging the gap between 3D model-based approaches and end-to-end diffusion techniques. Extensive experiments show that our method outperforms recent audio-driven portrait video models in both naturalness and controllability.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.19455 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.19455 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.19455 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.