Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models Paper • 2505.03821 • Published May 3, 2025 • 24
What Matters in Hierarchical Search for Combinatorial Reasoning Problems? Paper • 2406.03361 • Published Jun 5, 2024 • 1
Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models Paper • 2409.12969 • Published Sep 2, 2024 • 1
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Paper • 2411.13543 • Published Nov 20, 2024 • 19
When All Options Are Wrong: Evaluating Large Language Model Robustness with Incorrect Multiple-Choice Options Paper • 2409.00113 • Published Aug 27, 2024 • 2