arxiv:2505.16122

Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning

Published on May 22

· Submitted by

junhongmit on Jun 3

Upvote

Authors:

Junhong Lin ,

Xinyue Zeng ,

Abstract

Plan-and-Budget framework enhances reasoning efficiency in LLMs by allocating token budgets based on estimated sub-question complexity, improving accuracy, reducing token usage, and boosting $E^3$ metric.

AI-generated summary

Large Language Models (LLMs) have achieved remarkable success in complex reasoning tasks, but their inference remains computationally inefficient. We observe a common failure mode in many prevalent LLMs, overthinking, where models generate verbose and tangential reasoning traces even for simple queries. Recent works have tried to mitigate this by enforcing fixed token budgets, however, this can lead to underthinking, especially on harder problems. Through empirical analysis, we identify that this inefficiency often stems from unclear problem-solving strategies. To formalize this, we develop a theoretical model, BBAM (Bayesian Budget Allocation Model), which models reasoning as a sequence of sub-questions with varying uncertainty, and introduce the E^3 metric to capture the trade-off between correctness and computation efficiency. Building on theoretical results from BBAM, we propose Plan-and-Budget, a model-agnostic, test-time framework that decomposes complex queries into sub-questions and allocates token budgets based on estimated complexity using adaptive scheduling. Plan-and-Budget improves reasoning efficiency across a range of tasks and models, achieving up to +70% accuracy gains, -39% token reduction, and +187.5% improvement in E^3. Notably, it elevates a smaller model (DS-Qwen-32B) to match the efficiency of a larger model (DS-LLaMA-70B)-demonstrating Plan-and-Budget's ability to close performance gaps without retraining. Our code is available at anonymous.4open.science/r/P-and-B-6513/.

View arXiv page View PDF Add to collection

Community

junhongmit

Paper author Paper submitter 2 days ago

•

edited 2 days ago

🔗 Project Code: https://github.com/junhongmit/P-and-B

🚀 In this work, we introduce Plan-And-Budget, a test-time framework that improves reasoning efficiency in LLMs by combining structured planning and uncertainty-aware compute allocation. No retraining needed—just smarter token usage.

😆 Takeaways:

Reasoning Miscalibration in LLMs
LLMs often overthink easy queries (verbose, wasteful) or underthink hard ones (premature, incorrect). We identify this mismatch as a core inefficiency in current inference.
Bayesian Budget Allocation Model (BBAM)
We formalize reasoning as a sequence of sub-questions with varying uncertainty. BBAM allocates more tokens to epistemically uncertain steps, and fewer to those dominated by aleatoric noise.
Plan-And-Budget Framework
- Plan step: Decompose queries into sub-questions using lightweight planning.
- Budget step: Allocate tokens using decay-based heuristics (linear, polynomial, cosine, etc.).
  It’s model-agnostic, inference-only, and compatible with any LLM.
ℰ³ Score: A New Efficiency-Aware Effectiveness Evaluation Metric
Defined as ℰ³ = Accuracy² / Tokens, this metric rewards high-accuracy, low-token reasoning—penalizing both wasteful and careless inferences.
Strong Empirical Results
We extensively evaluate Plan-And-Budget on 3 types of reasoning tasks and 4 state-of-the-art reasoning LLMs, and shown improvements:
- 📈 Accuracy: up to +70%
- 🔻 Token use: up to –39%
- 🔥 ℰ³: up to +187.5%

On agentic planning task, our method lifts a small 32B model to match the efficiency of a 70B model—without fine-tuning.

Early Steps Matter Most
Polynomial and cosine decay schedulers work best, confirming that early reasoning steps—where uncertainty is highest—deserve more compute.

🧠 Plan before you think. Budget as you go. Smarter reasoning starts now.

librarian-bot

1 day ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.16122 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.16122 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.16122 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.