arxiv:2505.20258

ARM: Adaptive Reasoning Model

Published on May 26

· Submitted by

hsaest on May 27

Upvote

Authors:

Siye Wu ,

Jian Xie ,

Yikai Zhang ,

Abstract

Adaptive Reasoning Model (ARM) uses Ada-GRPO to reduce token usage and improve efficiency across different reasoning modes.

AI-generated summary

While large reasoning models demonstrate strong performance on complex tasks, they lack the ability to adjust reasoning token usage based on task difficulty. This often leads to the "overthinking" problem -- excessive and unnecessary reasoning -- which, although potentially mitigated by human intervention to control the token budget, still fundamentally contradicts the goal of achieving fully autonomous AI. In this work, we propose Adaptive Reasoning Model (ARM), a reasoning model capable of adaptively selecting appropriate reasoning formats based on the task at hand. These formats include three efficient ones -- Direct Answer, Short CoT, and Code -- as well as a more elaborate format, Long CoT. To train ARM, we introduce Ada-GRPO, an adaptation of Group Relative Policy Optimization (GRPO), which addresses the format collapse issue in traditional GRPO. Ada-GRPO enables ARM to achieve high token efficiency, reducing tokens by an average of 30%, and up to 70%, while maintaining performance comparable to the model that relies solely on Long CoT. Furthermore, not only does it improve inference efficiency through reduced token generation, but it also brings a 2x speedup in training. In addition to the default Adaptive Mode, ARM supports two additional reasoning modes: 1) Instruction-Guided Mode, which allows users to explicitly specify the reasoning format via special tokens -- ideal when the appropriate format is known for a batch of tasks. 2) Consensus-Guided Mode, which aggregates the outputs of the three efficient formats and resorts to Long CoT in case of disagreement, prioritizing performance with higher token usage.

View arXiv page View PDF Project page GitHub repository Add to collection

Community

hsaest

Paper author Paper submitter 3 days ago

ARM: A reasoning model capable of adaptively selecting reasoning formats based on the task, achieving a better trade-off between effectiveness and efficiency!

Project Page: https://team-arm.github.io/arm/
Data & Model: https://huggingface.co/collections/arm-team/arm-68302ffc0dd4f7f154cf3a23

UtopiaNo

3 days ago

Nice work!

PlayAI

2 days ago

Just finished reading. Great work!
and thanks to linking to L1 paper as well.

Simeng

about 14 hours ago

This is an excellent piece of work! I would like to suggest citing the paper below which also trains/prompts LLM to conduct reasoning format selection.

arxiv.org/abs/2409.19381

HybridMind: Meta-Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning.

This work trains an external medium-sized LM/prompts a powerful LLM to conduct reasoning format meta-selection between reasoning with natural language (CoT) or reasoning with symbolic language (python or first-order logic), before the model proceeds to generate the reasoning chain itself.

I respectfully suggest including a discussion of their relationship in the manuscript, as this would enhance the comprehensiveness of the analysis and further advance research in this important area.

hsaest

Paper author about 7 hours ago

Thanks for your suggestion, Simeng! HybridMind is a nice work, and we will include a discussion of it in our next revision.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.20258 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.20258 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.20258 in a Space README.md to link it from this page.