SixAILab

community

https://sihanxu.github.io/

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

marstin authored a paper about 2 months ago

Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation

marstin authored a paper about 2 months ago

Can Vision Language Models Infer Human Gaze Direction? A Controlled Study

marstin authored a paper about 2 months ago

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

View all activity

marstin

authored 3 papers about 2 months ago

Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation

Paper • 2506.21876 • Published Jun 27 • 28

Can Vision Language Models Infer Human Gaze Direction? A Controlled Study

Paper • 2506.05412 • Published Jun 4 • 4

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

Paper • 2506.18890 • Published Jun 23 • 6

Xuweiyi

authored a paper 3 months ago

Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

Paper • 2505.21491 • Published May 27 • 17

marstin

authored 2 papers 4 months ago

VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation

Paper • 2503.14350 • Published Mar 18

Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation

Paper • 2504.16060 • Published Apr 22

marstin

authored 4 papers 6 months ago

DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

Paper • 2406.03008 • Published Jun 5, 2024

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

Paper • 2407.07035 • Published Jul 9, 2024

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 75

Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors

Paper • 2502.13311 • Published Feb 18 • 2

Shoubin

authored a paper 8 months ago

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Paper • 2412.08467 • Published Dec 11, 2024 • 6

Xuweiyi

authored a paper 11 months ago

LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

Paper • 2309.12311 • Published Sep 21, 2023 • 17

marstin

authored 2 papers about 1 year ago

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

Paper • 2406.09264 • Published Jun 13, 2024 • 2

Multi-Object Hallucination in Vision-Language Models

Paper • 2407.06192 • Published Jul 8, 2024 • 12

sihanxu

authored a paper about 1 year ago

Multi-Object Hallucination in Vision-Language Models

Paper • 2407.06192 • Published Jul 8, 2024 • 12

Xuweiyi

authored 2 papers about 1 year ago

Multi-Object Hallucination in Vision-Language Models

Paper • 2407.06192 • Published Jul 8, 2024 • 12

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Paper • 2406.05132 • Published Jun 7, 2024 • 31

marstin

authored 2 papers over 1 year ago

Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue

Paper • 2305.11271 • Published May 18, 2023

GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

Paper • 2402.16846 • Published Feb 26, 2024

sihanxu

authored a paper over 1 year ago

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation

Paper • 2310.13165 • Published Oct 19, 2023

AI & ML interests

Recent Activity

Team members 5

SixAILab's activity