Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Abstract
A Thread Inference Model (TIM) and its runtime (TIMRUN) enable long-horizon reasoning in LLMs by using reasoning trees and key-value state retention, overcoming context and memory limitations.
To break the context limits of large language models (LLMs) that bottleneck reasoning accuracy and efficiency, we propose the Thread Inference Model (TIM), a family of LLMs trained for recursive and decompositional problem solving, and TIMRUN, an inference runtime enabling long-horizon structured reasoning beyond context limits. Together, TIM hosted on TIMRUN supports virtually unlimited working memory and multi-hop tool calls within a single language model inference, overcoming output limits, positional-embedding constraints, and GPU-memory bottlenecks. Performance is achieved by modeling natural language as reasoning trees measured by both length and depth instead of linear sequences. The reasoning trees consist of tasks with thoughts, recursive subtasks, and conclusions based on the concept we proposed in Schroeder et al, 2025. During generation, we maintain a working memory that retains only the key-value states of the most relevant context tokens, selected by a rule-based subtask-pruning mechanism, enabling reuse of positional embeddings and GPU memory pages throughout reasoning. Experimental results show that our system sustains high inference throughput, even when manipulating up to 90% of the KV cache in GPU memory. It also delivers accurate reasoning on mathematical tasks and handles information retrieval challenges that require long-horizon reasoning and multi-hop tool use.
Community
I am very ready to stop having my workflows interrupted because I need to start a new chat - really excited about the potential to have LLMs assistants that aren't memory constrained! Can't wait to see where this goes.
Give me the memory! This is awesome.
Ok this is promising!
Really excited about this! I've faced the problem myself with the startup im building!
Great research.
Are you keeping the dataset and TIMRUN engine under wraps or are there plans to release?
The reason I ask is because I'm interested in how you parse the hierarchy, considering it's autoregessive, I.E do you output the entire hierarchy depth-first in its entirety and then then execute the tools once the plan is fixed. Or do you go breadth-first one level at a time and wait for the tool results so that the plan dynamically unfolds at "runtime" with the emergent ability to self-correct, if so how do you deal with dependancies between nodes, critical paths etc
Thank you your interest and question! the structure is dynamically unfolded with autoregressive generation, without any prior settings for depth or breadth-first search and pauses. we deal with tools calls and subtask pruning on the fly.
we will keep updating our repo and progressively release data and examples. we will decide when is the best time to release the system.
arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/beyond-context-limits-subconscious-threads-for-long-horizon-reasoning
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper