arxiv:2406.16008

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Published on Jun 23, 2024

· Submitted by

cydhsieh01 on Jun 25, 2024

Upvote

Authors:

Cheng-Yu Hsieh ,

Yung-Sung Chuang ,

Zifeng Wang ,

Chen-Yu Lee ,

Abstract

Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit a U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better performance in locating relevant information within a long context, but also eventually leads to improved retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points. These findings open up future directions in understanding LLM attention bias and its potential consequences.

View arXiv page View PDF Add to collection

Community

cydhsieh01

Paper author Paper submitter Jun 25, 2024

Connecting lost-in-the-middle phenomenon to LLMs' intrinsic positional attention bias, and proposing a calibration mechanism that mitigates the bias and improves models' long-context performances.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.16008 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.16008 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.16008 in a Space README.md to link it from this page.