Synthesis Condition Predictor
This model predicts optimal temperature bins and atmosphere categories for inorganic material synthesis. It was trained on a dataset of text-mined synthesis procedures. Here is the source of the dataset: https://www.nature.com/articles/s41597-019-0224-1
Models Included:
- Temperature Bin Prediction (LightGBM)
- Atmosphere Category Prediction (LightGBM)
Intended Use: To assist researchers in designing synthesis experiments by predicting key process parameters. Input a target material, precursors, and basic operational details to get predictions.
How to Use:
# Ensure your inference script and its dependencies are in the PYTHONPATH
# from synthesis_predictor_hf_repo.src.inference import predict_synthesis_outcome, load_all_artifacts_once
# Or, if running from a cloned repo where 'src' is a subdirectory:
# from src.inference import predict_synthesis_outcome, load_all_artifacts_once
# if not load_all_artifacts_once():
# print("Failed to load model artifacts.")
# else:
# raw_input_example = {
# 'target_formula_raw': "YBa2Cu3O7",
# 'precursor_formulas_raw': ["Y2O3", "BaCO3", "CuO"],
# 'operations_simplified_list': [
# {'type': 'MixingOperation', 'string': 'Ball milling for 2h', 'conditions': {'duration': [{'value':2, 'unit':'h'}]}},
# {'type': 'HeatingOperation', 'string': 'Calcined at 920C for 10h in air',
# 'conditions': {'heating_temperature': [{'value':920}], 'heating_time': [{'value':10}], 'atmosphere':'air'}},
# {'type': 'HeatingOperation', 'string': 'Sintered at 950C for 20h in O2',
# 'conditions': {'heating_temperature': [{'value':950}], 'heating_time': [{'value':20}], 'atmosphere':'Oxygen'}}
# ],
# 'reactants_coeffs': [("Y2O3", 0.5), ("BaCO3", 2.0), ("CuO", 3.0)], # Example, adjust as needed
# 'products_coeffs': [("YBa2Cu3O7", 1.0)] # Example
# }
# predictions = predict_synthesis_outcome(raw_input_example)
# print(predictions)
Limitations:
- The model's accuracy is around 68-72%.
- Predictions are based on patterns in the training data and may not generalize to all chemical systems.
- The feature engineering for process parameters in the inference script relies on the user providing an
operations_simplified_list
that can be parsed by the internal logic. The quality of these inputs directly affects prediction accuracy.
Training Data: The model was trained on a proprietary dataset of text-mined inorganic synthesis procedures. (Kononova et al.) https://www.nature.com/articles/s41597-019-0224-1
Evaluation Results: The models were evaluated on a hold-out test set.
1. Tuned Temperature Bin Prediction Model:
- Overall Test Set Accuracy: 0.6821
- Overall Test Set F1 Score (Weighted): 0.6785
- Per-Class Performance (Test Set):
precision recall f1-score support TempBin_1_(1_to_900] 0.77 0.79 0.78 954 TempBin_2_(900_to_1100] 0.62 0.53 0.57 743 TempBin_3_(1100_to_1300] 0.58 0.58 0.58 768 TempBin_4_(1300_to_3000] 0.72 0.80 0.76 715 accuracy 0.68 3180 macro avg 0.67 0.68 0.67 3180 weighted avg 0.68 0.68 0.68 3180
2. Tuned Atmosphere Category Prediction Model:
- Overall Test Set Accuracy: 0.7193
- Overall Test Set F1 Score (Weighted): 0.7174
- Per-Class Performance (Test Set):
precision recall f1-score support Inert 0.59 0.38 0.46 139 Other_Atm_Target 1.00 0.44 0.62 9 Oxidizing 0.67 0.71 0.69 1552 Reducing 0.70 0.47 0.56 100 Unknown_Atm_Category 0.76 0.76 0.76 2098 accuracy 0.72 3898 macro avg 0.74 0.55 0.62 3898 weighted avg 0.72 0.72 0.72 3898
)