POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World Paper • 2403.05856 • Published Mar 9, 2024
Unveiling Visual Biases in Audio-Visual Localization Benchmarks Paper • 2409.06709 • Published Aug 25, 2024
Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions? Paper • 2405.17719 • Published May 28, 2024
TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM Paper • 2503.13377 • Published Mar 17 • 2