Papers
arxiv:2505.02156

Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents

Published on May 4
· Submitted by iiiiwis on May 6
Authors:
,
,
,
,
,

Abstract

Effective social intelligence simulation requires language agents to dynamically adjust reasoning depth, a capability notably absent in current approaches. While existing methods either lack this kind of reasoning capability or enforce uniform long chain-of-thought reasoning across all scenarios, resulting in excessive token usage and inappropriate social simulation. In this paper, we propose Adaptive Mode Learning (AML) that strategically selects from four thinking modes (intuitive reaction rightarrow deep contemplation) based on real-time context. Our framework's core innovation, the Adaptive Mode Policy Optimization (AMPO) algorithm, introduces three key advancements over existing methods: (1) Multi-granular thinking mode design, (2) Context-aware mode switching across social interaction, and (3) Token-efficient reasoning via depth-adaptive processing. Extensive experiments on social intelligence tasks confirm that AML achieves 15.6% higher task performance than state-of-the-art methods. Notably, our method outperforms GRPO by 7.0% with 32.8% shorter reasoning chains. These results demonstrate that context-sensitive thinking mode selection, as implemented in AMPO, enables more human-like adaptive reasoning than GRPO's fixed-depth approach

Community

Paper author Paper submitter

We have released the code and data at: https://github.com/MozerWang/AMPO

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.02156 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.02156 in a Space README.md to link it from this page.

Collections including this paper 3