---
license: mit
inference: false
datasets:
- ibm-research/otter_uniprot_bindingdb
---

# Otter UB CB Model Card

Otter-Knoweldge model trained using only one modality for molecules: Chemberta (CB)

## Model details
Otter models are based on Graph Neural Networks (GNN) that propagates initial embeddings through a set of layers that upgrade input embedding according to the node neighbours. 
The architecture of GNN consists of two main blocks: encoder and decoder. 
- For encoder we first define a projection layer which consists of a set of linear transformations for each node modality and projects nodes into common dimensionality, then we apply several multi-relational graph convolutional layers (R-GCN) which distinguish between different types of edges between source and target nodes by having a set of trainable parameters for each edge type. 
- For decoder we consider link prediction task, which consists of a scoring function that maps each triple of source and target nodes and the corresponding edge and maps that to a scalar number defined over interval [0; 1].


**Model training data:**

The model was trained over *Uniprot-BindingDB*


**Paper or resources for more information:**
- [GitHub Repo](https://github.com/IBM/otter-knowledge)
- [Paper](https://arxiv.org/abs/2306.12802)

**License:**

MIT

**Where to send questions or comments about the model:**
- [GitHub Repo](https://github.com/IBM/otter-knowledge)

## How to use

Clone the repo:
```sh
git clone https://github.com/IBM/otter-knowledge.git
cd otter-knowledge
```

- Use the BindingAffinity Class:
  
```python
import torch
from torch import nn


class BindingAffinity(nn.Module):

    def __init__(self, gnn, drug_modality):
        super(BindingAffinity, self).__init__()
        self.drug_modality = drug_modality
        self.protein_modality = 'protein-sequence-mean'
        self.drug_entity_name = 'Drug'
        self.protein_entity_name = 'Protein'
        self.drug_rel_id = 1
        self.protein_rel_id = 2
        self.protein_drug_rel_id = 0
        self.gnn = gnn
        self.device = 'cpu'
        hd1 = 512
        num_input = 2
        self.combine = torch.nn.ModuleList([nn.Linear(num_input * hd1, hd1), nn.ReLU(),
                                            nn.Linear(hd1, hd1), nn.ReLU(),
                                            nn.Linear(hd1, 1)])
        self.to(self.device)

    def forward(self, drug_embedding, protein_embedding):
        nodes = {
            self.drug_modality: {
                'embeddings': drug_embedding.unsqueeze(0).to(self.device),
                'node_indices': torch.tensor([1]).to(self.device)
            },
            self.drug_entity_name: {
                'embeddings': [None],
                'node_indices': torch.tensor([0]).to(self.device)
            },
            self.protein_modality: {
                'embeddings': protein_embedding.unsqueeze(0).to(self.device),
                'node_indices': torch.tensor([3]).to(self.device)
            },
            self.protein_entity_name: {
                'embeddings': [None],
                'node_indices': torch.tensor([2]).to(self.device)
            }
        }
        triples = torch.tensor([[1, 3],
                                [3, 4],
                                [0, 2]]).to(self.device)
        gnn_embeddings = self.gnn.encoder(nodes, triples)
        node_gnn_embeddings = []
        all_indices = [0, 2]

        for indices in all_indices:
            node_gnn_embedding = torch.index_select(gnn_embeddings, dim=0, index=torch.tensor(indices).to(self.device))
            node_gnn_embeddings.append(node_gnn_embedding)

        c = torch.cat(node_gnn_embeddings, dim=-1)
        for m in self.combine:
            c = m(c)

        return c```

- Run the inference with the initial embeddings (embeddings obtained after using the handlers (Chemberta, ESM1b) over the SMILES and the protein sequence):

```python
p = net(drug_embedding=drug_embedding, protein_embedding=protein_embedding)
print(p)```