Synthesis Condition Predictor

This model predicts optimal temperature bins and atmosphere categories for inorganic material synthesis. It was trained on a dataset of text-mined synthesis procedures. Here is the source of the dataset: https://www.nature.com/articles/s41597-019-0224-1

Models Included:

Temperature Bin Prediction (LightGBM)
Atmosphere Category Prediction (LightGBM)

Intended Use: To assist researchers in designing synthesis experiments by predicting key process parameters. Input a target material, precursors, and basic operational details to get predictions.

How to Use:

# Ensure your inference script and its dependencies are in the PYTHONPATH
# from synthesis_predictor_hf_repo.src.inference import predict_synthesis_outcome, load_all_artifacts_once

# Or, if running from a cloned repo where 'src' is a subdirectory:
# from src.inference import predict_synthesis_outcome, load_all_artifacts_once

# if not load_all_artifacts_once():
#     print("Failed to load model artifacts.")
# else:
#     raw_input_example = {
#         'target_formula_raw': "YBa2Cu3O7",
#         'precursor_formulas_raw': ["Y2O3", "BaCO3", "CuO"],
#         'operations_simplified_list': [
#             {'type': 'MixingOperation', 'string': 'Ball milling for 2h', 'conditions': {'duration': [{'value':2, 'unit':'h'}]}},
#             {'type': 'HeatingOperation', 'string': 'Calcined at 920C for 10h in air', 
#               'conditions': {'heating_temperature': [{'value':920}], 'heating_time': [{'value':10}], 'atmosphere':'air'}},
#             {'type': 'HeatingOperation', 'string': 'Sintered at 950C for 20h in O2', 
#               'conditions': {'heating_temperature': [{'value':950}], 'heating_time': [{'value':20}], 'atmosphere':'Oxygen'}}
#         ],
#         'reactants_coeffs': [("Y2O3", 0.5), ("BaCO3", 2.0), ("CuO", 3.0)], # Example, adjust as needed
#         'products_coeffs': [("YBa2Cu3O7", 1.0)] # Example
#     }
#     predictions = predict_synthesis_outcome(raw_input_example)
#     print(predictions)

Limitations:

The model's accuracy is around 68-72%.
Predictions are based on patterns in the training data and may not generalize to all chemical systems.
The feature engineering for process parameters in the inference script relies on the user providing an operations_simplified_list that can be parsed by the internal logic. The quality of these inputs directly affects prediction accuracy.

Training Data: The model was trained on a proprietary dataset of text-mined inorganic synthesis procedures. (Kononova et al.) https://www.nature.com/articles/s41597-019-0224-1

Evaluation Results: The models were evaluated on a hold-out test set.

1. Tuned Temperature Bin Prediction Model:

Overall Test Set Accuracy: 0.6821
Overall Test Set F1 Score (Weighted): 0.6785

Per-Class Performance (Test Set):

                              precision    recall  f1-score   support

    TempBin_1_(1_to_900]       0.77      0.79      0.78       954
 TempBin_2_(900_to_1100]       0.62      0.53      0.57       743
TempBin_3_(1100_to_1300]       0.58      0.58      0.58       768
TempBin_4_(1300_to_3000]       0.72      0.80      0.76       715

                accuracy                           0.68      3180
               macro avg       0.67      0.68      0.67      3180
            weighted avg       0.68      0.68      0.68      3180

2. Tuned Atmosphere Category Prediction Model:

Overall Test Set Accuracy: 0.7193
Overall Test Set F1 Score (Weighted): 0.7174

Per-Class Performance (Test Set):

                          precision    recall  f1-score   support

               Inert       0.59      0.38      0.46       139
    Other_Atm_Target       1.00      0.44      0.62         9
           Oxidizing       0.67      0.71      0.69      1552
            Reducing       0.70      0.47      0.56       100
Unknown_Atm_Category       0.76      0.76      0.76      2098

            accuracy                           0.72      3898
           macro avg       0.74      0.55      0.62      3898
        weighted avg       0.72      0.72      0.72      3898

)