Submitted by Weiyun1025 239 InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models · 47 authors 8
Submitted by LIKirin 119 PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters · 5 authors 7
Submitted by cuijiaxing 47 Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability · 3 authors 2
Submitted by wenhu 42 VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning · 6 authors 2
Submitted by starriver030515 38 FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding · 7 authors 3
Submitted by mponty 34 Iterative Self-Training for Code Generation via Reinforced Re-Ranking · 3 authors 2
Submitted by DogNeverSleep 30 Mavors: Multi-granularity Video Representation for Multimodal Large Language Model · 15 authors 2
Submitted by xhluca 27 AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories · 10 authors 2
Submitted by AIRobotZ 21 S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models · 5 authors 3
Submitted by ztwang 19 DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training · 4 authors 2
Submitted by leoozy 17 Breaking the Data Barrier -- Building GUI Agents Through Task Generalization · 7 authors 2
Submitted by brucelyu 15 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users · 21 authors 3
Submitted by codezakh 13 Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems · 5 authors 2
Submitted by LibraTree 11 VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search · 8 authors 4
Submitted by akhaliq 10 M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models · 6 authors 2
Submitted by yyamada 10 The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search · 8 authors 2
Submitted by parshinsh 8 LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models · 6 authors 2
Submitted by ChrisJuan 7 EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety · 10 authors 3
Submitted by mqliu 4 LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models · 11 authors 2
Submitted by Rexhaif 4 DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization? · 8 authors 2
Submitted by kpzhang996 4 MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models · 20 authors 2
Submitted by johnhalloran 3 MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits · 2 authors 2
Submitted by SteveZeyuZhang 1 DiffuMural: Restoring Dunhuang Murals with Multi-scale Diffusion · 9 authors 2