Papers
arxiv:2507.16784

Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Published on Jul 22
· Submitted by luohy on Jul 23
#1 Paper of the day
Authors:
,

Abstract

A Thread Inference Model (TIM) and its runtime (TIMRUN) enable long-horizon reasoning in LLMs by using reasoning trees and key-value state retention, overcoming context and memory limitations.

AI-generated summary

To break the context limits of large language models (LLMs) that bottleneck reasoning accuracy and efficiency, we propose the Thread Inference Model (TIM), a family of LLMs trained for recursive and decompositional problem solving, and TIMRUN, an inference runtime enabling long-horizon structured reasoning beyond context limits. Together, TIM hosted on TIMRUN supports virtually unlimited working memory and multi-hop tool calls within a single language model inference, overcoming output limits, positional-embedding constraints, and GPU-memory bottlenecks. Performance is achieved by modeling natural language as reasoning trees measured by both length and depth instead of linear sequences. The reasoning trees consist of tasks with thoughts, recursive subtasks, and conclusions based on the concept we proposed in Schroeder et al, 2025. During generation, we maintain a working memory that retains only the key-value states of the most relevant context tokens, selected by a rule-based subtask-pruning mechanism, enabling reuse of positional embeddings and GPU memory pages throughout reasoning. Experimental results show that our system sustains high inference throughput, even when manipulating up to 90% of the KV cache in GPU memory. It also delivers accurate reasoning on mathematical tasks and handles information retrieval challenges that require long-horizon reasoning and multi-hop tool use.

Community

Paper author Paper submitter
edited 6 days ago

TIMRUN api is live on https://subconscious.dev

I am very ready to stop having my workflows interrupted because I need to start a new chat - really excited about the potential to have LLMs assistants that aren't memory constrained! Can't wait to see where this goes.

Give me the memory! This is awesome.

Ok this is promising!

Really excited about this! I've faced the problem myself with the startup im building!

Great research.
Are you keeping the dataset and TIMRUN engine under wraps or are there plans to release?

The reason I ask is because I'm interested in how you parse the hierarchy, considering it's autoregessive, I.E do you output the entire hierarchy depth-first in its entirety and then then execute the tools once the plan is fixed. Or do you go breadth-first one level at a time and wait for the tool results so that the plan dynamically unfolds at "runtime" with the emergent ability to self-correct, if so how do you deal with dependancies between nodes, critical paths etc

·
Paper author

Thank you your interest and question! the structure is dynamically unfolded with autoregressive generation, without any prior settings for depth or breadth-first search and pauses. we deal with tools calls and subtask pruning on the fly.

we will keep updating our repo and progressively release data and examples. we will decide when is the best time to release the system.

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.16784 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.16784 in a Space README.md to link it from this page.

Collections including this paper 14