ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
Abstract
ProtoReasoning enhances large reasoning models through prototypical representations, leading to improved cross-domain generalization in logical reasoning, planning, and other tasks.
Recent advances in Large Reasoning Models (LRMs) trained with Long Chain-of-Thought (Long CoT) reasoning have demonstrated remarkable cross-domain generalization capabilities. However, the underlying mechanisms supporting such transfer remain poorly understood. We hypothesize that cross-domain generalization arises from shared abstract reasoning prototypes -- fundamental reasoning patterns that capture the essence of problems across domains. These prototypes minimize the nuances of the representation, revealing that seemingly diverse tasks are grounded in shared reasoning structures.Based on this hypothesis, we propose ProtoReasoning, a framework that enhances the reasoning ability of LLMs by leveraging scalable and verifiable prototypical representations (Prolog for logical reasoning, PDDL for planning).ProtoReasoning features: (1) an automated prototype construction pipeline that transforms problems into corresponding prototype representations; (2) a comprehensive verification system providing reliable feedback through Prolog/PDDL interpreters; (3) the scalability to synthesize problems arbitrarily within prototype space while ensuring correctness. Extensive experiments show that ProtoReasoning achieves 4.7% improvement over baseline models on logical reasoning (Enigmata-Eval), 6.3% improvement on planning tasks, 4.0% improvement on general reasoning (MMLU) and 1.0% on mathematics (AIME24). Significantly, our ablation studies confirm that learning in prototype space also demonstrates enhanced generalization to structurally similar problems compared to training solely on natural language representations, validating our hypothesis that reasoning prototypes serve as the foundation for generalizable reasoning in large language models.
Community
Recent advances in Large Reasoning Models (LRMs) trained with Long Chain-of-Thought (Long
CoT) reasoning have demonstrated remarkable cross-domain generalization capabilities. However,
the underlying mechanisms supporting such transfer remain poorly understood. We hypothesize
that cross-domain generalization arises from shared abstract reasoning prototypes — fundamental
reasoning patterns that capture the essence of problems across domains. These prototypes
minimize the nuances of the representation, revealing that seemingly diverse tasks are grounded in
shared reasoning structures. Based on this hypothesis, we propose ProtoReasoning, a framework
that enhances the reasoning ability of LLMs by leveraging scalable and verifiable prototypical
representations (Prolog for logical reasoning, PDDL for planning). ProtoReasoning features:
(1) an automated prototype construction pipeline that transforms problems into corresponding
prototype representations; (2) a comprehensive verification system providing reliable feedback
through Prolog/PDDL interpreters; (3) the scalability to synthesize problems arbitrarily within
prototype space while ensuring correctness. Extensive experiments show that ProtoReasoning
achieves 4.7% improvement over baseline models on logical reasoning (Enigmata-Eval), 6.3%
improvement on planning tasks, 4.0% improvement on general reasoning (MMLU) and 1.0% on
mathematics (AIME24). Significantly, our ablation studies confirm that learning in prototype space
also demonstrates enhanced generalization to structurally similar problems compared to training
solely on natural language representations, validating our hypothesis that reasoning prototypes
serve as the foundation for generalizable reasoning in large language models.
Interesting work, but I expected the boost on performance to be much more. Specially for AIME24, in optillm (https://github.com/codelion/optillm) when using with z3, we actually see significant improvement in AIME24. For instance with qwen2.5:14b-instruct-fp16 (with ollama) we saw the AIME24 scores go from 10.00 to 20.00.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper