Submitted by ChocoWu 59 Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation · 11 authors 4
Submitted by ChenYi99 30 Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 · 7 authors 3
Submitted by tarsur909 29 CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis · 9 authors 2
Submitted by weizhiwang 26 Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources · 5 authors 7
Submitted by wbhu-tc 20 GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors · 6 authors 2
Submitted by AndrewZhou924 20 Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models · 8 authors 2
Submitted by xw-eric 18 Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents · 6 authors 2
Submitted by ColorfulAI 17 OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts · 6 authors 2
Submitted by pabloruizponce 16 MixerMDM: Learnable Composition of Human Motion Diffusion Models · 5 authors 2
Submitted by Ray121381 16 Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models · 10 authors 2
Submitted by akhaliq 15 Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems? · 7 authors 11
Submitted by hbXNov 12 When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning · 7 authors 1
Submitted by deepkyu 12 Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features · 9 authors 2
Submitted by carboncoo 10 AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization · 12 authors 3
Submitted by xk-huang 7 m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models · 5 authors 2
Submitted by akhaliq 7 Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead · 11 authors 2
Submitted by akhaliq 6 Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs · 4 authors 2
Submitted by MaksimSTW 5 Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base · 9 authors 2
Submitted by MrezaPRZ 4 Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL · 8 authors 2
Submitted by onandon 2 DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting · 2 authors 2
Submitted by akhaliq 2 ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning · 5 authors 2
Submitted by rdkarim 1 MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing · 3 authors 2