arxiv:2504.13828

Generative AI Act II: Test Time Scaling Drives Cognition Engineering

Published on Apr 18

· Submitted by

seven-cat on Apr 21

Upvote

Authors:

Shijie Xia ,

Run-Ze Fan ,

Abstract

The first generation of Large Language Models - what might be called "Act I" of generative AI (2020-2023) - achieved remarkable success through massive parameter and data scaling, yet exhibited fundamental limitations in knowledge latency, shallow reasoning, and constrained cognitive processes. During this era, prompt engineering emerged as our primary interface with AI, enabling dialogue-level communication through natural language. We now witness the emergence of "Act II" (2024-present), where models are transitioning from knowledge-retrieval systems (in latent space) to thought-construction engines through test-time scaling techniques. This new paradigm establishes a mind-level connection with AI through language-based thoughts. In this paper, we clarify the conceptual foundations of cognition engineering and explain why this moment is critical for its development. We systematically break down these advanced approaches through comprehensive tutorials and optimized implementations, democratizing access to cognition engineering and enabling every practitioner to participate in AI's second act. We provide a regularly updated collection of papers on test-time scaling in the GitHub Repository: https://github.com/GAIR-NLP/cognition-engineering

View arXiv page View PDF Project page GitHub repository Add to collection

Community

seven-cat

Paper author Paper submitter about 21 hours ago

•

edited about 8 hours ago

This paper comprehensively introduces the characteristics, technical approaches, application prospects, and future directions of the second act of generative AI development, providing valuable insights for diverse audiences:
👩‍🔬 As an AI researcher, are you seeking new research directions to break through current large language model bottlenecks ？
💻 As an AI application engineer, do you need hands-on, experience-based tutorials for implementing Test-time Scaling in your specific use cases?
🎓 As a student or AI newcomer, are you looking for a systematic framework to understand "cognition engineering" and "Test-time Scaling," complete with beginner-friendly code tutorials? With the abundance of RL Scaling training techniques, how can you organize them effectively?
👩‍🏫 As an educator, do you require well-structured teaching resources to explain "Test-time Scaling" concepts to your students?

seven-cat

Paper author about 17 hours ago

•

edited about 17 hours ago

This article delivers essential systematic resources:
✨ A comprehensive workflow diagram for applying Test-time scaling across domains, with practical examples spanning mathematics, code, multimodal, agents, embodied AI, safety, retrieval-augmented generation, and evaluation.
🚀 A detailed overview of methods to enhance Test-time scaling efficiency, covering techniques like parallel sampling, tree search, multi-turn correction, and long CoT.
🧩 Practical guidance on leveraging reinforcement learning to unlock Long CoT capabilities, including code tutorials, implementation summaries, and strategies for addressing common training challenges.
📚 A valuable compilation of long CoT resources across various domains.
🔭 Ongoing tracking of Test-Time scaling frontiers and emerging research developments.