You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

By submitting any personal information (e.g., name, contact details), you agree to the collection and processing of this data for the purpose of evaluating access requests for this model. Repository authors will store this data securely and will not share it with third parties without your explicit consent. You retain all rights to your personal information and may request its deletion at any time.

By accessing the repository you agree not to use this model in experiments which may result in harm to human or animal subjects.

Log in or Sign Up to review the conditions and access this model content.

ML Conformer Generator

ML Conformer Generator is a shape-constrained molecule generation model that combines an Equivariant Diffusion Model (EDM) and Graph Convolutional Network (GCN). It generates 3D conformations that are chemically valid and geometrically aligned with a reference shape.


📦 Model Summary

  • Architecture: Equivariant Diffusion Model (EDM) + Graph Convolutional Network (GCN)
  • Training Data: 1.6 million ChEMBL compounds, filtered for molecules with 15–39 heavy atoms
  • Post-Processing: Deterministic standardization pipeline using RDKit with constrained MMFF94 geometry optimization
  • Primary Metric: Shape Tanimoto Similarity
  • Developed by: Denis Sapegin

🚀 Intended Use

  • Non-Commercial Research in 3D molecular generation
  • Academic/educational use
  • Generation of molecules similar to a reference conformer
  • Generation of molecules similar to a reference arbitrary shape

🚫 Out of Scope / Limitations

  • Commercial Use: Not licensed for commercial use without explicit permission.
  • Training Bias: Trained on ChEMBL data — results may be biased toward drug-like molecules and chemistries.
  • Elements Supported: Only the following elements are supported for generation: H, C, N, O, F, P, S, Cl, Br.
  • Molecular Size Limitations:
    • Trained on molecules containing 15–39 heavy atoms.
    • By architectural design, the model can only generate molecules with up to 42 heavy atoms.

🧪 Evaluation Metrics (100,000 requested samples, 100 denoising steps)

  • Valid molecules (post-standardization, % from requested): 48%
  • 🧬 Chemical novelty: 99.84%
  • 📐 Avg Shape Tanimoto: 53.32%
  • 🎯 Max Shape Tanimoto: 99.69%
  • 🔁 Unique molecules: 99.94%
  • Generation speed: 4.18 valid molecules/sec (NVIDIA H100)
  • 💾 Memory (per thread): up to 4.0 GB
  • 🧬 Fréchet Fingerprint Distance (to ChEMBL): 4.13

🧠 How It Works

Core Components:

  • EDM generates atom coordinates and types under shape constraints
  • GCN predicts adjacency matrices (bonding)
  • RDKit pipeline enforces valence, performs sanitization, and optimizes geometry

Shape Alignment:

Evaluated using Gaussian molecular volume overlap and Shape Tanimoto Similarity.

Hydrogens are excluded from similarity computation.


💾 Access & Licensing

The Python package and inference code are available on GitHub under Apache 2.0 License

https://github.com/Membrizard/ml_conformer_generator

The trained model Weights are available at

https://huggingface.co/Membrizard/ml_conformer_generator

And are licensed under CC BY-NC-ND 4.0

The usage of the trained weights for any profit-generating activity is restricted.

For commercial licensing and inference-as-a-service, contact: Denis Sapegin


Installation

  1. Install the package:

pip install mlconfgen

  1. Load the weights from Huggingface

    https://huggingface.co/Membrizard/ml_conformer_generator

PyTorch

edm_moi_chembl_15_39.pt

adj_mat_seer_chembl_15_39.pt

ONNX

edm_moi_chembl_15_39.onnx

adj_mat_seer_chembl_15_39.onnx


🐍 Python API

PyTorch

from rdkit import Chem
from mlconfgen import MLConformerGenerator, evaluate_samples

model = MLConformerGenerator(
                              edm_weights="./edm_moi_chembl_15_39.pt",
                              adj_mat_seer_weights="./adj_mat_seer_chembl_15_39.pt",
                              diffusion_steps=100,
                            )

reference = Chem.MolFromMolFile('ceyyag.mol')

samples = model.generate_conformers(reference_conformer=reference, n_samples=20, variance=2)

aligned_reference, std_samples = evaluate_samples(reference, samples)

ONNX

from mlconfgen import MLConformerGeneratorONNX
from rdkit import Chem

model = MLConformerGeneratorONNX(
                                 egnn_onnx="./egnn_chembl_15_39.onnx",
                                 adj_mat_seer_onnx="./adj_mat_seer_chembl_15_39.onnx",
                                 diffusion_steps=100,
                                )

reference = Chem.MolFromMolFile('ceyyag.mol')
samples = model.generate_conformers(reference_conformer=reference, n_samples=20, variance=2)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results