Papers
arxiv:2305.04812

A Drop of Ink Makes a Million Think: The Spread of False Information in Large Language Models

Published on May 8, 2023
Authors:
,
,
,
,
,

Abstract

Large language models (LLMs) have gained increasing prominence in artificial intelligence, making a profound impact on society and various industries like business and science. However, the presence of false information on the internet and in text corpus poses a significant risk to the reliability and safety of LLMs, underscoring the urgent need to understand the mechanisms of how false information influences the behaviors of LLMs. In this paper, we dive into this problem and investigate how false information spreads in LLMs and affects related responses. Specifically, in our series of experiments, we investigate different factors that can influence the spread of information in LLMs by comparing three degrees of information relevance (direct, indirect, and peripheral), four information source styles (Twitter, web blogs, news reports, and research papers) and two common knowledge injection paradigms (in-context injection and learning-based injection). The experimental results show that (1)False information will spread and contaminate related memories in LLMs via a semantic diffusion process, i.e., false information has global detrimental effects beyond its direct impact. (2)Current LLMs are susceptible to authority bias, i.e., LLMs are more likely to follow false information presented in trustworthy styles such as news reports and research papers, which usually cause deeper and wider pollution of information. (3)Current LLMs are more sensitive to false information through in-context injection than through learning-based injection, which severely challenges the reliability and safety of LLMs even when all training data are trusty and correct. The above findings raise the need for new false information defense algorithms to address the global impact of false information, and new alignment algorithms to unbiasedly lead LLMs to follow essential human values rather than superficial patterns.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2305.04812 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2305.04812 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2305.04812 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.