Update README.md
Browse files
README.md
CHANGED
@@ -4,3 +4,33 @@ base_model:
|
|
4 |
license: other
|
5 |
---
|
6 |
LoRA fine-tuned checkpoint for protein inverse folding.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
license: other
|
5 |
---
|
6 |
LoRA fine-tuned checkpoint for protein inverse folding.
|
7 |
+
|
8 |
+
# Protein Inverse Folding
|
9 |
+
We finetune the [AIDO.Protein-16B](https://huggingface.co/genbio-ai/AIDO.DNA-16B) model with LoRA on the [CATH 4.2](https://pubmed.ncbi.nlm.nih.gov/9309224/) benmark dataset. We use the same train, validation, and test splits used by the previous studies, such as [LM-Design](https://arxiv.org/abs/2302.01649), and [DPLM](https://arxiv.org/abs/2402.18567). Current version of ModelGenerator contains the inference pipeline for protein inverse folding. Experimental pipeline on other datasets (both training and testing) will be included in the future.
|
10 |
+
|
11 |
+
#### Setup
|
12 |
+
Install [Model Generator](https://github.com/genbio-ai/modelgenerator)
|
13 |
+
|
14 |
+
#### Running inference:
|
15 |
+
|
16 |
+
- Set the environment variable for ModelGenerator's data directory:
|
17 |
+
```
|
18 |
+
export MGEN_DATA_DIR=~/mgen_data # or any other local directory of your choice
|
19 |
+
```
|
20 |
+
- Download the `model.ckpt` checkpoint from [here](https://huggingface.co/genbio-ai/AIDO.Protein-16B-inv-fold). Place it inside the directory `${MGEN_DATA_DIR}/modelgenerator/huggingface_models/protein_inv_fold/AIDO.Protein-16B-inverse_folding/`.
|
21 |
+
|
22 |
+
- Download the CATH 4.2 dataset preprocessed by [Generative Models for Graph-Based Protein Design (Ingraham et al, NeurIPS'19)](https://papers.nips.cc/paper_files/paper/2019/file/f3a4ff4839c56a5f460c88cce3666a2b-Paper.pdf) from [here](http://people.csail.mit.edu/ingraham/graph-protein-design/data/cath/). You should find two files named `chain_set.jsonl` and `chain_set_splits.json`. Place them inside the directory `${MGEN_DATA_DIR}/modelgenerator/datasets/protein_inv_fold/cath_4.2/`.
|
23 |
+
|
24 |
+
- Then run the following bash script in `experiments/AIDO.Protein` under your installation of [Model Generator](https://github.com/genbio-ai/modelgenerator)
|
25 |
+
```
|
26 |
+
bash prot_inverse_folding.sh
|
27 |
+
```
|
28 |
+
|
29 |
+
#### Outputs:
|
30 |
+
- The evaluation score will be printed on the console.
|
31 |
+
- The generated sequences will be stored in `./proteinIF_outputs/designed_sequences.pkl`.
|
32 |
+
|
33 |
+
|
34 |
+
|
35 |
+
#### Note:
|
36 |
+
- Multi-GPU inference for inverse folding is not currently supported and will be included in the future.
|