arxiv:2404.07143

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Published on Apr 10, 2024

· Submitted by

akhaliq on Apr 11, 2024

#1 Paper of the day

Authors:

,

,

Abstract

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-term linear attention mechanisms in a single Transformer block. We demonstrate the effectiveness of our approach on long-context language modeling benchmarks, 1M sequence length passkey context block retrieval and 500K length book summarization tasks with 1B and 8B LLMs. Our approach introduces minimal bounded memory parameters and enables fast streaming inference for LLMs.

View arXiv page View PDF Add to collection

Community

Zgoly

Apr 11, 2024

This comment has been hidden

melisa

Apr 11, 2024

•

edited Apr 11, 2024

Awesome work! Any chance of publishing the code too?

Apr 12, 2024

Excellent work! I'm curious, is the gating scalar β the only additional parameter that requires training?

Apr 13, 2024

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Apr 15, 2024

I'm working on a pytorch implementation, come and join me in the repo if you wanna help
https://github.com/jlamprou/Infini-Attention

beomi

Apr 18, 2024

•

edited Apr 18, 2024

Here's fully working implementation repo!
https://github.com/Beomi/InfiniTransformer

( @glamprou 's repo inspired me a lot! thanks ☺️)

beomi

Apr 19, 2024

Llama-3 is out!
I updated my repo(https://github.com/Beomi/InfiniTransformer) to train Llama-3 with 1M seq len 🤩

mustafaaljadery

May 9, 2024

An implementation of Infini-attention on Gemma 2B for 10M context - https://github.com/mustafaaljadery/gemma-2B-10M

HarryWu

May 14, 2024

this method can be plug-in-play, or must training first?

·

melisa

May 14, 2024

From what I understood - you could take a pretrained model and add infini-attention memory etc, but then you need to train it.

Jun 8, 2024

Unlocking Infinite Context: Meet Infini-attention for Transformers!

Links 🔗:

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

By Arxflix

tovwil

Jul 5, 2024

hki

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2404.07143 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 36