Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System
Abstract
Fact2Fiction is a poisoning attack framework that targets agentic fact-checking systems by exploiting their decomposition strategy and justifications, achieving higher attack success rates than existing methods.
State-of-the-art fact-checking systems combat misinformation at scale by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanatory rationales for the verdicts). The security of these systems is crucial, as compromised fact-checkers, which tend to be easily underexplored, can amplify misinformation. This work introduces Fact2Fiction, the first poisoning attack framework targeting such agentic fact-checking systems. Fact2Fiction mirrors the decomposition strategy and exploits system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9\%--21.2\% higher attack success rates than state-of-the-art attacks across various poisoning budgets. Fact2Fiction exposes security weaknesses in current fact-checking systems and highlights the need for defensive countermeasures.
Community
State-of-the-art fact-checking systems combat misinformation at scale by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanatory rationales for the verdicts). The security of these systems is crucial, as compromised fact-checkers, which tend to be easily underexplored, can amplify misinformation. This work introduces Fact2Fiction, the first poisoning attack framework targeting such agentic fact-checking systems. Fact2Fiction mirrors the decomposition strategy and exploits system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9%--21.2% higher attack success rates than state-of-the-art attacks across various poisoning budgets. Fact2Fiction exposes security weaknesses in current fact-checking systems and highlights the need for defensive countermeasures.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Token-Level Precise Attack on RAG: Searching for the Best Alternatives to Mislead Generation (2025)
- Context manipulation attacks : Web agents are susceptible to corrupted memory (2025)
- The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover (2025)
- From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows (2025)
- Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers (2025)
- MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems (2025)
- SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper