Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues Paper • 2506.00958 • Published 6 days ago • 20
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation Paper • 2505.18842 • Published 13 days ago • 35
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents Paper • 2505.15277 • Published 17 days ago • 98
G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness Paper • 2505.05026 • Published 30 days ago • 15
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms Paper • 2503.14427 • Published Mar 18 • 19
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding Paper • 2411.19527 • Published Nov 29, 2024 • 10
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation Paper • 2410.13232 • Published Oct 17, 2024 • 45