probablybots commited on
Commit
38e2b89
·
verified ·
1 Parent(s): f530586

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -20
README.md CHANGED
@@ -1,14 +1,22 @@
1
  # AIDO.Protein-16B-v1
2
- ## Model Description
3
- We pretrained our model in three stages. This model represents the final stage, where we continued training our AIDO.Protein-16B using an additional 100 billion amino acids from Uniref90.
 
4
 
5
  ## How to Use
6
- ### Build any downstream models from this backbone
 
 
 
 
 
 
 
7
  #### Embedding
8
  ```python
9
- from genbio_finetune.tasks import Embed
10
- model = Embed.from_config({"model.backbone": "proteinfm_v1"}).eval()
11
- collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
12
  embedding = model(collated_batch)
13
  print(embedding.shape)
14
  print(embedding)
@@ -16,9 +24,9 @@ print(embedding)
16
  #### Sequence Level Classification
17
  ```python
18
  import torch
19
- from genbio_finetune.tasks import SequenceClassification
20
- model = SequenceClassification.from_config({"model.backbone": "proteinfm_v1", "model.n_classes": 2}).eval()
21
- collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
22
  logits = model(collated_batch)
23
  print(logits)
24
  print(torch.argmax(logits, dim=-1))
@@ -26,26 +34,32 @@ print(torch.argmax(logits, dim=-1))
26
  #### Token Level Classification
27
  ```python
28
  import torch
29
- from genbio_finetune.tasks import TokenClassification
30
- model = TokenClassification.from_config({"model.backbone": "proteinfm_v1", "model.n_classes": 3}).eval()
31
- collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
32
  logits = model(collated_batch)
33
  print(logits)
34
  print(torch.argmax(logits, dim=-1))
35
  ```
36
  #### Regression
37
  ```python
38
- from genbio_finetune.tasks import SequenceRegression
39
- model = SequenceRegression.from_config({"model.backbone": "proteinfm_v1"}).eval()
40
- collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
41
  logits = model(collated_batch)
42
  print(logits)
43
  ```
44
- #### Protein-Protein Interaction
45
 
46
- #### Or use our one-liner CLI to finetune or evaluate any of the above!
 
47
  ```
48
- gbft fit --model SequenceClassification --model.backbone proteinfm_v1 --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
49
- gbft test --model SequenceClassification --model.backbone proteinfm_v1 --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
 
 
 
 
 
 
 
50
  ```
51
- For more information, visit: [Model Generator](https://github.com/genbio-ai/modelgenerator)
 
1
  # AIDO.Protein-16B-v1
2
+
3
+ AIDO.Protein-16B-v1 continues the pre-training of [AIDO.Protein-16B](https://huggingface.co/genbio-ai/AIDO.Protein-16B) using an additional 100 billion amino acids from Uniref90.
4
+
5
 
6
  ## How to Use
7
+ ### Build any downstream models from this backbone with ModelGenerator
8
+ For more information, visit: [Model Generator](https://github.com/genbio-ai/modelgenerator)
9
+ ```bash
10
+ mgen fit --model SequenceClassification --model.backbone aido_protein_16b_v1 --data SequenceClassificationDataModule --data.path <hf_or_local_path_to_your_dataset>
11
+ mgen test --model SequenceClassification --model.backbone aido_protein_16b_v1 --data SequenceClassificationDataModule --data.path <hf_or_local_path_to_your_dataset>
12
+ ```
13
+
14
+ ### Or use directly in Python
15
  #### Embedding
16
  ```python
17
+ from modelgenerator.tasks import Embed
18
+ model = Embed.from_config({"model.backbone": "aido_protein_16b_v1"}).eval()
19
+ collated_batch = model.collate({"sequences": ["HELLQ", "WRLD"]})
20
  embedding = model(collated_batch)
21
  print(embedding.shape)
22
  print(embedding)
 
24
  #### Sequence Level Classification
25
  ```python
26
  import torch
27
+ from modelgenerator.tasks import SequenceClassification
28
+ model = SequenceClassification.from_config({"model.backbone": "aido_protein_16b_v1", "model.n_classes": 2}).eval()
29
+ collated_batch = model.collate({"sequences": ["HELLQ", "WRLD"]})
30
  logits = model(collated_batch)
31
  print(logits)
32
  print(torch.argmax(logits, dim=-1))
 
34
  #### Token Level Classification
35
  ```python
36
  import torch
37
+ from modelgenerator.tasks import TokenClassification
38
+ model = TokenClassification.from_config({"model.backbone": "aido_protein_16b_v1", "model.n_classes": 3}).eval()
39
+ collated_batch = model.collate({"sequences": ["HELLQ", "WRLD"]})
40
  logits = model(collated_batch)
41
  print(logits)
42
  print(torch.argmax(logits, dim=-1))
43
  ```
44
  #### Regression
45
  ```python
46
+ from modelgenerator.tasks import SequenceRegression
47
+ model = SequenceRegression.from_config({"model.backbone": "aido_protein_16b_v1"}).eval()
48
+ collated_batch = model.collate({"sequences": ["HELLQ", "WRLD"]})
49
  logits = model(collated_batch)
50
  print(logits)
51
  ```
 
52
 
53
+ # Citation
54
+ Please cite AIDO.Protein using the following BibTex code:
55
  ```
56
+ @inproceedings{sun_mixture_2024,
57
+ title = {Mixture of Experts Enable Efficient and Effective Protein Understanding and Design},
58
+ url = {https://www.biorxiv.org/content/10.1101/2024.11.29.625425v1},
59
+ doi = {10.1101/2024.11.29.625425},
60
+ publisher = {bioRxiv},
61
+ author = {Sun, Ning and Zou, Shuxian and Tao, Tianhua and Mahbub, Sazan and Li, Dian and Zhuang, Yonghao and Wang, Hongyi and Cheng, Xingyi and Song, Le and Xing, Eric P.},
62
+ year = {2024},
63
+ booktitle={NeurIPS 2024 Workshop on AI for New Drug Modalities},
64
+ }
65
  ```