VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published 16 days ago • 71
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published Jul 1 • 72
VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments Paper • 2506.02387 • Published Jun 3 • 57
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs Paper • 2505.21327 • Published May 27 • 83
GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition Paper • 2506.07553 • Published Jun 9 • 15
PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model Paper • 2503.18484 • Published Mar 24
GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition Paper • 2506.07553 • Published Jun 9 • 15
GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition Paper • 2506.07553 • Published Jun 9 • 15 • 2
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining Paper • 2410.08102 • Published Oct 10, 2024 • 20
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios Paper • 2408.17267 • Published Aug 30, 2024 • 24
CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis Paper • 2408.14765 • Published Aug 27, 2024 • 15
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios Paper • 2408.17267 • Published Aug 30, 2024 • 24