Can Vision Language Models Infer Human Gaze Direction? A Controlled Study Paper • 2506.05412 • Published 22 days ago • 4
4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time Paper • 2506.18890 • Published 3 days ago • 4
Improving Zero-Shot Object-Level Change Detection by Incorporating Visual Correspondence Paper • 2501.05555 • Published Jan 9 • 1
TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models Paper • 2412.18675 • Published Dec 24, 2024 • 1
Can Vision Language Models Infer Human Gaze Direction? A Controlled Study Paper • 2506.05412 • Published 22 days ago • 4
VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation Paper • 2503.14350 • Published Mar 18
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation Paper • 2504.16060 • Published Apr 22
Core Knowledge Deficits in Multi-Modal Language Models Paper • 2410.10855 • Published Oct 6, 2024 • 1
Probing Mechanical Reasoning in Large Vision Language Models Paper • 2410.00318 • Published Oct 1, 2024
Core Knowledge Deficits in Multi-Modal Language Models Paper • 2410.10855 • Published Oct 6, 2024 • 1
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences Paper • 2406.03008 • Published Jun 5, 2024
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models Paper • 2407.07035 • Published Jul 9, 2024
Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors Paper • 2502.13311 • Published Feb 18 • 1
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions Paper • 2406.09264 • Published Jun 13, 2024 • 2