Can Vision Language Models Infer Human Gaze Direction? A Controlled Study Paper • 2506.05412 • Published 22 days ago • 4
4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time Paper • 2506.18890 • Published 3 days ago • 4
Frame In-N-Out: Unbounded Controllable Image-to-Video Generation Paper • 2505.21491 • Published 30 days ago • 17
VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation Paper • 2503.14350 • Published Mar 18
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation Paper • 2504.16060 • Published Apr 22
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences Paper • 2406.03008 • Published Jun 5, 2024
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models Paper • 2407.07035 • Published Jul 9, 2024
Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors Paper • 2502.13311 • Published Feb 18 • 1
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel Paper • 2412.08467 • Published Dec 11, 2024 • 6
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent Paper • 2309.12311 • Published Sep 21, 2023 • 17
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions Paper • 2406.09264 • Published Jun 13, 2024 • 2
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination Paper • 2406.05132 • Published Jun 7, 2024 • 31
Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue Paper • 2305.11271 • Published May 18, 2023
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation Paper • 2402.16846 • Published Feb 26, 2024
CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation Paper • 2310.13165 • Published Oct 19, 2023