Papers
arxiv:2406.12056

Learning Molecular Representation in a Cell

Published on Jun 17
· Submitted by liuganghuggingface on Jun 24
Authors:
,
,

Abstract

Predicting drug efficacy and safety in vivo requires information on biological responses (e.g., cell morphology and gene expression) to small molecule perturbations. However, current molecular representation learning methods do not provide a comprehensive view of cell states under these perturbations and struggle to remove noise, hindering model generalization. We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells. We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria. For each molecule in a training batch, InfoAlign optimizes the encoder's latent representation with a minimality objective to discard redundant structural information. A sufficiency objective decodes the representation to align with different feature spaces from the molecule's neighborhood in the context graph. We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods. Empirically, we validate representations from InfoAlign in two downstream tasks: molecular property prediction against up to 19 baseline methods across four datasets, plus zero-shot molecule-morphology matching.

Community

Paper author Paper submitter

Why Molecular Structures Are Insufficient and What's the Future of Molecular Representation Learning? For a long time, molecular structures like sequences and graphs have been the sole focus of molecular representation learning—but they aren't enough. Molecular functions also depend on contextual factors within biological systems. For accurate in vivo bioactivity predictions, we must learn from biological responses in cells. Discover new advancements in our latest research, Learning Molecular Representation in a Cell, with code.

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.12056 in a Space README.md to link it from this page.

Collections including this paper 2