The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
Abstract
TAIL, a method that imitates Turing Machine execution processes, enhances the length generalization and performance of LLMs by synthesizing chain-of-thought data and reducing shortcut learning.
Length generalization, the ability to solve problems of longer sequences than those observed during training, poses a core challenge of Transformer-based large language models (LLM). Although existing studies have predominantly focused on data-driven approaches for arithmetic operations and symbolic manipulation tasks, these approaches tend to be task-specific with limited overall performance. To pursue a more general solution, this paper focuses on a broader case of reasoning problems that are computable, i.e., problems that algorithms can solve, thus can be solved by the Turing Machine. From this perspective, this paper proposes Turing MAchine Imitation Learning (TAIL) to improve the length generalization ability of LLMs. TAIL synthesizes chain-of-thoughts (CoT) data that imitate the execution process of a Turing Machine by computer programs, which linearly expands the reasoning steps into atomic states to alleviate shortcut learning and explicit memory fetch mechanism to reduce the difficulties of dynamic and long-range data access in elementary operations. To validate the reliability and universality of TAIL, we construct a challenging synthetic dataset covering 8 classes of algorithms and 18 tasks. Without bells and whistles, TAIL significantly improves the length generalization ability as well as the performance of Qwen2.5-7B on various tasks using only synthetic data, surpassing previous methods and DeepSeek-R1. The experimental results reveal that the key concepts in the Turing Machine, instead of the thinking styles, are indispensable for TAIL for length generalization, through which the model exhibits read-and-write behaviors consistent with the properties of the Turing Machine in their attention layers. This work provides a promising direction for future research in the learning of LLM reasoning from synthetic data.
Community
Train LLMs to imitate Turing Machines for universal length generalization on a diverse synthetic dataset of 18 tasks across 8 algorithmic classes.
Related paper: "Universal Length Generalization with Turing Programs"
https://huggingface.co/papers/2407.03310
Thank you for your interest in our work! ☺️
We have indeed cited the paper you mentioned and greatly appreciate its contributions. In our study, we also carefully highlighted three notable differences:
1. We introduce more explicit core modules for length generalization by integrating the RASP-L hypothesis with the formal definition of Turing Machines, and we provide experimental verification of both sufficiency and necessity.
2. We evaluate on a broader task set of 18 algorithmic problems to rigorously demonstrate universality and reliability, which are central goals of the TAIL framework.
3. Our method achieves strong results without relying on specialized positional encodings like Hard-ALiBi; instead, we demonstrate robust effectiveness using RoPE, which is more commonly adopted in modern LLMs.
Thanks again for the thoughtful comment — I’d be happy to discuss further~
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper