Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts Paper • 2506.05229 • Published Jun 5 • 37 • 3
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack Paper • 2406.10149 • Published Jun 14, 2024 • 53 • 4