Papers
arxiv:2501.11899

LASER: Lip Landmark Assisted Speaker Detection for Robustness

Published on Jan 21
Authors:
,

Abstract

LASER, a lip landmark-assisted active speaker detection model, improves robustness and accuracy by incorporating lip landmarks and an auxiliary consistency loss, outperforming existing models on in-domain and out-of-domain benchmarks, including high-noise real-world scenarios.

AI-generated summary

Active Speaker Detection (ASD) aims to identify who is speaking in complex visual scenes. While humans naturally rely on lip-audio synchronization, existing ASD models often misclassify non-speaking instances when lip movements and audio are unsynchronized. To address this, we propose Lip landmark Assisted Speaker dEtection for Robustness (LASER), which explicitly incorporates lip landmarks during training to guide the model's attention to speech-relevant regions. Given a face track, LASER extracts visual features and encodes 2D lip landmarks into dense maps. To handle failure cases such as low resolution or occlusion, we introduce an auxiliary consistency loss that aligns lip-aware and face-only predictions, removing the need for landmark detectors at test time. LASER outperforms state-of-the-art models across both in-domain and out-of-domain benchmarks. To further evaluate robustness in realistic conditions, we introduce LASER-bench, a curated dataset of modern video clips with varying levels of background noise. On the high-noise subset, LASER improves mAP by 3.3 and 4.3 points over LoCoNet and TalkNet, respectively, demonstrating strong resilience to real-world acoustic challenges.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2501.11899 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.11899 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.11899 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.