arxiv:2507.16784

Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Published on Jul 22

· Submitted by

luohy on Jul 23

#1 Paper of the day

Upvote

109

Authors:

Hongyin Luo ,

Nathaniel Morgan ,

Tina Li ,

Derek Zhao ,

Ai Vy Ngo ,

Lijie Yang ,

Assaf Ben-Kish ,

Jack O'Brien ,

Abstract

A Thread Inference Model (TIM) and its runtime (TIMRUN) enable long-horizon reasoning in LLMs by using reasoning trees and key-value state retention, overcoming context and memory limitations.

AI-generated summary

To break the context limits of large language models (LLMs) that bottleneck reasoning accuracy and efficiency, we propose the Thread Inference Model (TIM), a family of LLMs trained for recursive and decompositional problem solving, and TIMRUN, an inference runtime enabling long-horizon structured reasoning beyond context limits. Together, TIM hosted on TIMRUN supports virtually unlimited working memory and multi-hop tool calls within a single language model inference, overcoming output limits, positional-embedding constraints, and GPU-memory bottlenecks. Performance is achieved by modeling natural language as reasoning trees measured by both length and depth instead of linear sequences. The reasoning trees consist of tasks with thoughts, recursive subtasks, and conclusions based on the concept we proposed in Schroeder et al, 2025. During generation, we maintain a working memory that retains only the key-value states of the most relevant context tokens, selected by a rule-based subtask-pruning mechanism, enabling reuse of positional embeddings and GPU memory pages throughout reasoning. Experimental results show that our system sustains high inference throughput, even when manipulating up to 90% of the KV cache in GPU memory. It also delivers accurate reasoning on mathematical tasks and handles information retrieval challenges that require long-horizon reasoning and multi-hop tool use.

View arXiv page View PDF Project page GitHub 26 Add to collection

Community

luohy

Paper author Paper submitter 6 days ago

•

edited 6 days ago

TIMRUN api is live on https://subconscious.dev

blakeblaze

6 days ago

I am very ready to stop having my workflows interrupted because I need to start a new chat - really excited about the potential to have LLMs assistants that aren't memory constrained! Can't wait to see where this goes.

DevinNash

6 days ago

Give me the memory! This is awesome.

Kutches

6 days ago

Ok this is promising!

tomasguiloff

6 days ago

Really excited about this! I've faced the problem myself with the startup im building!

MichaelBarryUK

5 days ago

Great research.
Are you keeping the dataset and TIMRUN engine under wraps or are there plans to release?

The reason I ask is because I'm interested in how you parse the hierarchy, considering it's autoregessive, I.E do you output the entire hierarchy depth-first in its entirety and then then execute the tools once the plan is fixed. Or do you go breadth-first one level at a time and wait for the tool results so that the plan dynamically unfolds at "runtime" with the emergent ability to self-correct, if so how do you deal with dependancies between nodes, critical paths etc

luohy

Paper author 4 days ago

Thank you your interest and question! the structure is dynamically unfolded with autoregressive generation, without any prior settings for depth or breadth-first search and pauses. we deal with tools calls and subtask pruning on the fly.

we will keep updating our repo and progressively release data and examples. we will decide when is the best time to release the system.