Papers
arxiv:2502.16069

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

Published on Feb 22
· Submitted by AmberLJC on Feb 26
Authors:
,
,
,
,
,
,
,
,
,

Abstract

Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4times improvement in correctly answering experimental questions.Curie is open-sourced at https://github.com/Just-Curieous/Curie.

Community

Paper submitter

Move Scientific Research at the Speed of Thought. This paper introduces Curie, an AI agent framework designed to automate scientific research experimentation. By integrating modules that enhance reliability, enforce methodical control, and improve interpretability, Curie addresses the critical challenges of automating rigorous experimentation. Curie is able to reproduce a few AI research paper through experimentation.
Evaluated against an experimentation benchmark spanning multiple computer science domains, Curie demonstrated a 3.4× improvement in accurately answering experimental questions compared to existing baselines.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.16069 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.16069 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.16069 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.