AdaptThink: LLM Can Learn When to Think

🤗 HF Collections • 💻 Github Repo • 📃 Paper

🔍 Table of Contents

🤖️ AdaptThink
⚙️ Released Models
📊 Evaluation
📝 Citation

🤖️ AdaptThink

We present AdapThink, a novel reinforcement learning (RL) algorithm that enables reasoning models to adaptively choose between Thinking and NoThinking modes according to the difficulty of each input problem, thereby achieving automatic hybrid reasoning. Specifically, the model engages in thinking only when the problem is determined to be challenging; for other simple question, it will bypass the thinking process and directly produce a concise final solution. This approach substantially reduces inference costs while further improving overall performance.

⚙️ Released Models

All Available Datasets and Models

We apply the AdaptThink algorithm on DeepSeek-R1-Distill-Qwen-1.5B with $\delta$ from 0 to 0.1, and DeepSeek-R1-Distill-Qwen-7B with $\delta=0.05$. A larger $\large$ results in a higher proportion of NoThinking responses, which reduces more inference costs but also diminish the resultant improvement in accuracy.

All the trained models are available on HuggingFace.

Name	HF Repo
AdaptThink-1.5B-delta0	🤗 HF Repo
AdaptThink-1.5B-delta0.01	🤗 HF Repo
AdaptThink-1.5B-delta0.02	🤗 HF Repo
AdaptThink-1.5B-delta0.05	🤗 HF Repo
AdaptThink-1.5B-delta0.075	🤗 HF Repo
AdaptThink-1.5B-delta0.1	🤗 HF Repo
AdaptThink-7B-delta0.05	🤗 HF Repo

📊 Evaluation Results

We list our evaluation results as follows:

1. Comparison with existing methods for efficient reasoning on mathematics datasets

2. Nothinking responses ratio and accuracy across different difficulty levels on MATH500

3. Comparison of different $\delta$ values

4. Evaluation results on MMLU

📝 Citation

If you find our work useful, please consider citing LongReward:

@article{zhang2025adapt_think,
  title = {AdaptThink: LLM Can Learn When to Think} 
  author={Jiajie Zhang and Nianyi Lin and Lei Hou and Ling Feng and Juanzi Li},
  journal={arXiv preprint arXiv: 2505.13417},
  url={https://arxiv.org/abs/2505.13417}
  year={2025}
}

THU-KEG
/

AdaptThink-7B-delta0.05