Papers
arxiv:2502.11962

Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning

Published on Feb 17
Authors:
,
,
,
,
,
,

Abstract

New paradigms $UNIT_{cut}$ and $UNIT_{ref}$ address the trade-off between informativeness and truthfulness in instruction fine-tuning of large language models.

AI-generated summary

Instruction fine-tuning (IFT) can increase the informativeness of large language models (LLMs), but may reduce their truthfulness. This trade-off arises because IFT steers LLMs to generate responses containing long-tail knowledge that was not well covered during pre-training. As a result, models become more informative but less accurate when generalizing to unseen tasks. In this paper, we empirically demonstrate how unfamiliar knowledge in IFT datasets can negatively affect the truthfulness of LLMs, and we introduce two new IFT paradigms, UNIT_{cut} and UNIT_{ref}, to address this issue. UNIT_{cut} identifies and removes unfamiliar knowledge from IFT datasets to mitigate its impact on model truthfulness, whereas UNIT_{ref} trains LLMs to recognize their uncertainty and explicitly indicate it at the end of their responses. Our experiments show that UNIT_{cut} substantially improves LLM truthfulness, while UNIT_{ref} maintains high informativeness and reduces hallucinations by distinguishing between confident and uncertain statements.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.11962 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.11962 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.11962 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.