Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention
Abstract
A novel ToF depth denoising network uses motion-invariant graph fusion and adaptive filters to improve temporal stability and spatial sharpness, achieving state-of-the-art performance.
Depth images captured by Time-of-Flight (ToF) sensors are prone to noise, requiring denoising for reliable downstream applications. Previous works either focus on single-frame processing, or perform multi-frame processing without considering depth variations at corresponding pixels across frames, leading to undesirable temporal inconsistency and spatial ambiguity. In this paper, we propose a novel ToF depth denoising network leveraging motion-invariant graph fusion to simultaneously enhance temporal stability and spatial sharpness. Specifically, despite depth shifts across frames, graph structures exhibit temporal self-similarity, enabling cross-frame geometric attention for graph fusion. Then, by incorporating an image smoothness prior on the fused graph and data fidelity term derived from ToF noise distribution, we formulate a maximum a posterior problem for ToF denoising. Finally, the solution is unrolled into iterative filters whose weights are adaptively learned from the graph-informed geometric attention, producing a high-performance yet interpretable network. Experimental results demonstrate that the proposed scheme achieves state-of-the-art performance in terms of accuracy and consistency on synthetic DVToF dataset and exhibits robust generalization on the real Kinectv2 dataset. Source code will be released at https://github.com/davidweidawang/GIGA-ToF{https://github.com/davidweidawang/GIGA-ToF}.
Community
Depth images captured by Time-of-Flight (ToF) sensors are prone to noise, requiring denoising for reliable downstream applications. Previous works either focus on single-frame processing, or perform multi-frame processing without considering depth variations at corresponding pixels across frames, leading to undesirable temporal inconsistency and spatial ambiguity. In this paper, we propose a novel ToF depth denoising network leveraging motion-invariant graph fusion to simultaneously enhance temporal stability and spatial sharpness. Specifically, despite depth shifts across frames, graph structures exhibit temporal self-similarity, enabling cross-frame geometric attention for graph fusion. Then, by incorporating an image smoothness prior on the fused graph and data fidelity term derived from ToF noise distribution, we formulate a maximum a posterior problem for ToF denoising. Finally, the solution is unrolled into iterative filters whose weights are adaptively learned from the graph-informed geometric attention, producing a high-performance yet interpretable network. Experimental results demonstrate that the proposed scheme achieves state-of-the-art performance in terms of accuracy and consistency on synthetic DVToF dataset and exhibits robust generalization on the real Kinectv2 dataset.
โญ๏ธ Highlights
๐ Cross-frame graph fusion for ToF denoising
Fuses motion-invariant graph structures across frames to achieve both temporal consistency and spatial sharpness.
๐ง Graph-informed geometric attention (GIGA)
Learns graph edges via attention from geometric features, enabling accurate cross-frame correspondence.
๐ฌInterpretable and robust design
Unrolls MAP optimization with graph Laplacian regularization into a network, achieving high denoising accuracy and generalization to real ToF data.
๐ State-of-the-art performance
Outperforms prior works by at least 37.9% MAE and 13.2% TEPE on DVToF; robust on real Kinect v2 data without fine-tuning.
this paper has been accept by ICCV 2025 :)
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper