Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge
Abstract
Materials synthesis is vital for innovations such as energy storage, catalysis, electronics, and biomedical devices. Yet, the process relies heavily on empirical, trial-and-error methods guided by expert intuition. Our work aims to support the materials science community by providing a practical, data-driven resource. We have curated a comprehensive dataset of 17K expert-verified synthesis recipes from open-access literature, which forms the basis of our newly developed benchmark, AlchemyBench. AlchemyBench offers an end-to-end framework that supports research in large language models applied to synthesis prediction. It encompasses key tasks, including raw materials and equipment prediction, synthesis procedure generation, and characterization outcome forecasting. We propose an LLM-as-a-Judge framework that leverages large language models for automated evaluation, demonstrating strong statistical agreement with expert assessments. Overall, our contributions offer a supportive foundation for exploring the capabilities of LLMs in predicting and guiding materials synthesis, ultimately paving the way for more efficient experimental design and accelerated innovation in materials science.
Community
Towards Fully-Automated Materials Discovery: A New Era in Materials Science
We are thrilled to share our latest research, "Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge", which represents a significant step forward in the field of materials science.
The Challenge
Materials synthesis is the backbone of innovations in energy storage, catalysis, electronics, and biomedical devices. However, the traditional trial-and-error approach is time-consuming and heavily reliant on expert intuition. This inefficiency has long hindered progress in the field.
Our Solution
We introduce Open Materials Guide (OMG) to address these challenges, a dataset comprising 17,000+ high-quality, expert-verified synthesis recipes extracted from open-access literature. Building on this dataset, we developed AlchemyBench, the first end-to-end benchmark to evaluate machine learning models for materials synthesis prediction.
Key features of AlchemyBench include:
- Raw Materials & Equipment Prediction: Models predict essential components for synthesis.
- Synthesis Procedure Generation: Automated generation of step-by-step synthesis workflows.
- Characterization Outcome Forecasting: Predicting experimental results with precision.
A Breakthrough Framework: LLM-as-a-Judge
Our research also introduces the LLM-as-a-Judge framework, leveraging large language models (LLMs) to assess synthesis predictions. This framework demonstrates strong alignment with expert evaluations, offering a scalable alternative to costly human assessments.
Why It Matters
Our contributions pave the way for:
- Accelerated experimental design.
- Enhanced reproducibility in materials research.
- A deeper understanding of how AI can transform scientific discovery.
Open Access for Collaboration
To foster collaboration and innovation, we have made our dataset and code openly available to the research community. You can explore them here: GitHub Repository.
Join Us in Shaping the Future
This work marks a pivotal moment in materials science, blending AI and data-driven approaches to revolutionize how we discover and synthesize new materials. We invite researchers, scientists, and enthusiasts to join us in exploring the potential of fully automated materials discovery.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Exploring the expertise of large language models in materials science and metallurgical engineering (2025)
- SCP-116K: A High-Quality Problem-Solution Dataset and a Generalized Pipeline for Automated Extraction in the Higher Education Science Domain (2025)
- Knowledge Hierarchy Guided Biological-Medical Dataset Distillation for Domain LLM Training (2025)
- Leveraging Large Language Models as Knowledge-Driven Agents for Reliable Retrosynthesis Planning (2025)
- Towards an automated workflow in materials science for combining multi-modal simulative and experimental information using data mining and large language models (2025)
- Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents (2025)
- Generalization of Medical Large Language Models through Cross-Domain Weak Supervision (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper