arxiv:2508.03346

Compressing Chain-of-Thought in LLMs via Step Entropy

Published on Aug 5

· Submitted by

zeju-0727 on Aug 12

Upvote

Authors:

Zeju Li ,

Abstract

A novel CoT compression framework using step entropy and a two-stage training strategy enhances LLM inference efficiency without significantly reducing accuracy.

AI-generated summary

Large Language Models (LLMs) using Chain-of-Thought (CoT) prompting excel at complex reasoning but generate verbose thought processes with considerable redundancy, leading to increased inference costs and reduced efficiency. We introduce a novel CoT compression framework based on step entropy, a metric that quantifies the informational contribution of individual reasoning steps to identify redundancy. Through theoretical analysis and extensive empirical validation on mathematical reasoning benchmarks, we demonstrate that steps with low entropy are indeed highly redundant. Our experiments reveal that an astonishing 80\% of low-entropy intermediate steps can be pruned with minor degradation in the final answer accuracy across DeepSeek-R1-7B, 14B and Qwen3-8B. This finding sharply contrasts with random or high-entropy pruning, which severely impairs reasoning performance. Building on this, we propose a novel two-stage training strategy combining Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) reinforcement learning. This approach enables LLMs to autonomously learn to generate compressed COTs during inference by strategically incorporating [SKIP] tokens. Our method significantly enhances LLM inference efficiency while rigorously preserving accuracy, offering profound implications for practical LLM deployment and a deeper understanding of reasoning structures.

View arXiv page View PDF Add to collection

Community

zeju-0727

Paper author Paper submitter 4 days ago

Researchers introduce a novel method to compress verbose Chain-of-Thought (CoT) reasoning in Large Language Models by identifying and pruning redundant steps using "step entropy" - achieving 35-57% token reduction while maintaining accuracy.

Key Contributions:
🎯 Step Entropy Metric: A principled way to measure the informational contribution of individual reasoning steps by aggregating token-level entropy during generation.

📊 Surprising Finding: Up to 80% of low-entropy reasoning steps can be safely removed without accuracy loss, while high-entropy steps are crucial and cannot be pruned.

⚡ Practical Impact: Achieves substantial efficiency gains across multiple models-DeepSeek-R1: 29.7-43.5% token reduction, Qwen3-8B: 16.2-44.9% token reduction. Maintains or improves accuracy on mathematical reasoning benchmarks.

🔧 Two-Stage Training: Combines Supervised Fine-Tuning with reinforcement learning (GRPO) to teach models to autonomously generate compressed reasoning during inference using [SKIP] tokens.

librarian-bot

3 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.03346 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.03346 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.03346 in a Space README.md to link it from this page.