POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World Paper • 2403.05856 • Published Mar 9, 2024
Unveiling Visual Biases in Audio-Visual Localization Benchmarks Paper • 2409.06709 • Published Aug 25, 2024
Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions? Paper • 2405.17719 • Published May 28, 2024
Time-R1 Collection Time-R1: Post-Training Large Vision-Language Model for Temporal Video Grounding • 3 items • Updated 24 days ago
Time-R1 Collection Time-R1: Post-Training Large Vision-Language Model for Temporal Video Grounding • 3 items • Updated 24 days ago
Time-R1 Collection Time-R1: Post-Training Large Vision-Language Model for Temporal Video Grounding • 3 items • Updated 24 days ago
TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM Paper • 2503.13377 • Published Mar 17 • 2