Abstract
DeepConf enhances reasoning efficiency and performance by filtering low-quality reasoning traces using model-internal confidence signals, achieving high accuracy and reducing token generation.
Large Language Models (LLMs) have shown great potential in reasoning tasks through test-time scaling methods like self-consistency with majority voting. However, this approach often leads to diminishing returns in accuracy and high computational overhead. To address these challenges, we introduce Deep Think with Confidence (DeepConf), a simple yet powerful method that enhances both reasoning efficiency and performance at test time. DeepConf leverages model-internal confidence signals to dynamically filter out low-quality reasoning traces during or after generation. It requires no additional model training or hyperparameter tuning and can be seamlessly integrated into existing serving frameworks. We evaluate DeepConf across a variety of reasoning tasks and the latest open-source models, including Qwen 3 and GPT-OSS series. Notably, on challenging benchmarks such as AIME 2025, DeepConf@512 achieves up to 99.9% accuracy and reduces generated tokens by up to 84.7% compared to full parallel thinking.
Community
Deep Think with Confidence (DeepConf) is a parallel thinking method that enhances both LLM reasoning performance and efficiency at test time. It leverages model-internal confidence signals to dynamically filter low-quality reasoning traces during or after generation. It requires no additional model training or hyperparameter tuning and can be seamlessly integrated into existing serving frameworks. It achieves up to 99.9% accuracy on AIME 2025 while reducing generated tokens by up to 84.7% compared to the standard thinking approaches.
We also propose the similar idea in our previous efficient test-time scaling(https://arxiv.org/pdf/2503.00031)
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Efficient Reasoning for Large Reasoning Language Models via Certainty-Guided Reflection Suppression (2025)
- MUR: Momentum Uncertainty guided Reasoning for Large Language Models (2025)
- Adaptive Termination for Multi-round Parallel Reasoning: An Universal Semantic Entropy-Guided Framework (2025)
- Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs (2025)
- Lost at the Beginning of Reasoning (2025)
- Accelerating LLM Reasoning via Early Rejection with Partial Reward Modeling (2025)
- Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper