AdaptThink: LLM Can Learn When to Think
🤗 HF Collections • 💻 Github Repo • 📃 Paper
🔍 Table of Contents
🤖️ AdaptThink
We present AdapThink, a novel reinforcement learning (RL) algorithm that enables reasoning models to adaptively choose between Thinking and NoThinking modes according to the difficulty of each input problem, thereby achieving automatic hybrid reasoning. Specifically, the model engages in thinking only when the problem is determined to be challenging; for other simple question, it will bypass the thinking process and directly produce a concise final solution. This approach substantially reduces inference costs while further improving overall performance.
⚙️ Released Models
All Available Datasets and Models
We apply the AdaptThink algorithm on DeepSeek-R1-Distill-Qwen-1.5B with $\delta$ from 0 to 0.1, and DeepSeek-R1-Distill-Qwen-7B with $\delta=0.05$. A larger $\large$ results in a higher proportion of NoThinking responses, which reduces more inference costs but also diminish the resultant improvement in accuracy.
All the trained models are available on HuggingFace.
Name | HF Repo |
---|---|
AdaptThink-1.5B-delta0 | 🤗 HF Repo |
AdaptThink-1.5B-delta0.01 | 🤗 HF Repo |
AdaptThink-1.5B-delta0.02 | 🤗 HF Repo |
AdaptThink-1.5B-delta0.05 | 🤗 HF Repo |
AdaptThink-1.5B-delta0.075 | 🤗 HF Repo |
AdaptThink-1.5B-delta0.1 | 🤗 HF Repo |
AdaptThink-7B-delta0.05 | 🤗 HF Repo |
📊 Evaluation Results
We list our evaluation results as follows:
1. Comparison with existing methods for efficient reasoning on mathematics datasets
2. Nothinking responses ratio and accuracy across different difficulty levels on MATH500
3. Comparison of different $\delta$ values
4. Evaluation results on MMLU

📝 Citation
If you find our work useful, please consider citing LongReward:
@article{zhang2025adapt_think,
title = {AdaptThink: LLM Can Learn When to Think}
author={Jiajie Zhang and Nianyi Lin and Lei Hou and Ling Feng and Juanzi Li},
journal={arXiv preprint arXiv: 2505.13417},
url={https://arxiv.org/abs/2505.13417}
year={2025}
}
- Downloads last month
- 1
Model tree for THU-KEG/AdaptThink-7B-delta0.05
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B