I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Paper • 2503.18878 • Published 9 days ago • 110
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking Paper • 2503.19855 • Published 8 days ago • 24
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper • 2503.21620 • Published 6 days ago • 53
Can Large Vision Language Models Read Maps Like a Human? Paper • 2503.14607 • Published 15 days ago • 9
Where do Large Vision-Language Models Look at when Answering Questions? Paper • 2503.13891 • Published 16 days ago • 8
On the Acquisition of Shared Grammatical Representations in Bilingual Language Models Paper • 2503.03962 • Published 28 days ago • 3
LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic Templatisation and Orthographic Obfuscation Paper • 2503.02972 • Published 29 days ago • 23
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users Paper • 2503.02268 • Published 30 days ago • 10
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published Mar 2 • 56
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs Paper • 2502.17422 • Published Feb 24 • 7
Introducing Visual Perception Token into Multimodal Large Language Model Paper • 2502.17425 • Published Feb 24 • 15
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? Paper • 2502.17535 • Published Feb 24 • 8
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model Paper • 2502.18906 • Published Feb 26 • 12
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? Paper • 2502.19361 • Published Feb 26 • 27
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models Paper • 2502.14302 • Published Feb 20 • 9
VLM^2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues Paper • 2502.12084 • Published Feb 17 • 29
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published Feb 12 • 55
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 188