Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Abstract
Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4times improvement in correctly answering experimental questions.Curie is open-sourced at https://github.com/Just-Curieous/Curie.
Community
Move Scientific Research at the Speed of Thought. This paper introduces Curie, an AI agent framework designed to automate scientific research experimentation. By integrating modules that enhance reliability, enforce methodical control, and improve interpretability, Curie addresses the critical challenges of automating rigorous experimentation. Curie is able to reproduce a few AI research paper through experimentation.
Evaluated against an experimentation benchmark spanning multiple computer science domains, Curie demonstrated a 3.4× improvement in accurately answering experimental questions compared to existing baselines.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper