Papers
arxiv:2507.08799

KV Cache Steering for Inducing Reasoning in Small Language Models

Published on Jul 11
· Submitted by yukimasano on Jul 14
Authors:
,
,
,
,
,

Abstract

Cache steering improves reasoning in language models through a single intervention in the key-value cache, enhancing both reasoning structure and task performance.

AI-generated summary

We propose cache steering, a lightweight method for implicit steering of language models via a one-shot intervention applied directly to the key-value cache. To validate its effectiveness, we apply cache steering to induce chain-of-thought reasoning in small language models. Our approach leverages GPT-4o-generated reasoning traces to construct steering vectors that shift model behavior toward more explicit, multi-step reasoning without fine-tuning or prompt modifications. Experimental evaluations on diverse reasoning benchmarks demonstrate that cache steering improves both the qualitative structure of model reasoning and quantitative task performance. Compared to prior activation steering techniques that require continuous interventions, our one-shot cache steering offers substantial advantages in terms of hyperparameter stability, inference-time efficiency, and ease of integration, making it a more robust and practical solution for controlled generation.

Community

Paper submitter

Paper proposes alternative to activation steering; instead steers kv-cache inside an LLM to induce reasoning in small LLMs

This is brilliant. Well done all.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.08799 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.08799 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.08799 in a Space README.md to link it from this page.

Collections including this paper 4