arxiv:2504.20769

Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption

Published on Apr 29

· Submitted by

Authors:

Abstract

Chain-of-thought prompting has demonstrated great success in facilitating the reasoning abilities of large language models. In this work, we explore how these enhanced reasoning abilities can be exploited to improve the robustness of large language models in tasks that are not necessarily reasoning-focused. In particular, we show how a wide range of large language models exhibit significantly improved robustness against reference corruption using a simple method called chain-of-defensive-thought, where only a few exemplars with structured and defensive reasoning are provided as demonstrations. Empirically, the improvements can be astounding, especially given the simplicity and applicability of the method. For example, in the Natural Questions task, the accuracy of GPT-4o degrades from 60% to as low as 3% with standard prompting when 1 out of 10 references provided is corrupted with prompt injection attacks. In contrast, GPT-4o using chain-of-defensive-thought prompting maintains an accuracy of 50%.

View arXiv page View PDF Add to collection

Community

wangwenxiao

Paper submitter about 11 hours ago

🛡️ Using Reasoning LLMs for Reliability

The world is investing heavily in reasoning LLMs — but 🤔 how can it help tasks that aren’t reasoning-intensive?

One angle:
Reasoning abilities (of LLMs) can be exploited for reliability!

We explored this and it’s surprisingly easy & surprisingly effective!
🔗 Read the paper

📚 Background

LLMs are naturally limited in up-to-date or specialized knowledge.
That’s why so many — including OpenAI and Google — augment them with external references (e.g., RAG, search, deep research).

However, when those references are compromised, LLM performance can break down — raising serious reliability concerns:

🔗 Zou et al. (2024)
🔗 Greshake et al. (2023)

🧠 Introducing Chain-of-Defensive-Thought

We propose a simple, prompting-only method called Chain-of-Defensive-Thought to enhance LLM robustness against corrupted external references.

No fine-tuning needed
Just a few exemplars with structured, defensive reasoning

Illustration:

📈 Key Results

Despite its simplicity, Chain-of-Defensive-Thought significantly improves LLM robustness across a wide range of models!

🚀 Why It Matters

Simple: Just prompting — no architecture changes.
Effective: Major reliability improvements.
Timely: Perfect for boosting systems based on RAG, search augmentation, and retrieval pipelines.

This could open up exciting new research directions with the rise of reasoning-optimized LLMs (e.g., OpenAI's o-series, DeepSeek R1). Thoughts?

librarian-bot

about 8 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2504.20769 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2504.20769 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2504.20769 in a Space README.md to link it from this page.