ShuxianZou commited on
Commit
56462eb
·
verified ·
1 Parent(s): f15ce14

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -17
README.md CHANGED
@@ -1,53 +1,61 @@
1
  ---
2
  base_model:
3
- - genbio-ai/rnafm-1.6b
4
  ---
5
- # AIDO.RNA 1.6B CDS
6
 
7
- AIDO.RNA 1.6B CDS is a domain adaptation model on the coding sequences. It was pre-trained on 9 million coding sequences starting with the AIDO.RNA 1.6B model.
8
 
9
  ## How to Use
10
  ### Build any downstream models from this backbone
11
  #### Embedding
12
  ```python
13
  from genbio_finetune.tasks import Embed
14
- model = Embed.from_config({"model.backbone": "rnafm_cds"}).eval()
15
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
16
  embedding = model(collated_batch)
17
  print(embedding.shape)
18
  print(embedding)
19
  ```
 
 
 
 
 
 
 
 
 
 
20
  #### Sequence Level Classification
21
  ```python
22
  import torch
23
  from genbio_finetune.tasks import SequenceClassification
24
- model = SequenceClassification.from_config({"model.backbone": "rnafm_cds", "model.n_classes": 2}).eval()
25
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
26
  logits = model(collated_batch)
27
  print(logits)
28
  print(torch.argmax(logits, dim=-1))
29
  ```
 
30
  #### Token Level Classification
31
  ```python
32
  import torch
33
  from genbio_finetune.tasks import TokenClassification
34
- model = TokenClassification.from_config({"model.backbone": "rnafm_cds", "model.n_classes": 3}).eval()
35
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
36
  logits = model(collated_batch)
37
  print(logits)
38
  print(torch.argmax(logits, dim=-1))
39
  ```
40
- #### Regression
41
- ```python
42
- from genbio_finetune.tasks import SequenceRegression
43
- model = SequenceRegression.from_config({"model.backbone": "rnafm_cds"}).eval()
44
- collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
45
- logits = model(collated_batch)
46
- print(logits)
47
- ```
48
  #### Or use our one-liner CLI to finetune or evaluate any of the above!
49
  ```
50
- gbft fit --model SequenceClassification --model.backbone rnafm_cds --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
51
- gbft test --model SequenceClassification --model.backbone rnafm_cds --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
52
  ```
53
- For more information, visit: [Model Generator](https://github.com/genbio-ai/modelgenerator)
 
 
 
 
 
1
  ---
2
  base_model:
3
+ - genbio-ai/AIDO.RNA-1.6B
4
  ---
5
+ # AIDO.RNA-1.6B-CDS
6
 
7
+ AIDO.RNA-1.6B-CDS is a domain adaptation model on the coding sequences. It was pre-trained on 9 million coding sequences released by Carlos et al. (2024) [1] based on our [AIDO.RNA-1.6B](https://huggingface.co/genbio-ai/AIDO.RNA-1.6B) model.
8
 
9
  ## How to Use
10
  ### Build any downstream models from this backbone
11
  #### Embedding
12
  ```python
13
  from genbio_finetune.tasks import Embed
14
+ model = Embed.from_config({"model.backbone": "aido_rna_1b600m_cds"}).eval()
15
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
16
  embedding = model(collated_batch)
17
  print(embedding.shape)
18
  print(embedding)
19
  ```
20
+
21
+ #### Regression
22
+ ```python
23
+ from genbio_finetune.tasks import SequenceRegression
24
+ model = SequenceRegression.from_config({"model.backbone": "aido_rna_1b600m_cds"}).eval()
25
+ collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
26
+ logits = model(collated_batch)
27
+ print(logits)
28
+ ```
29
+
30
  #### Sequence Level Classification
31
  ```python
32
  import torch
33
  from genbio_finetune.tasks import SequenceClassification
34
+ model = SequenceClassification.from_config({"model.backbone": "aido_rna_1b600m_cds", "model.n_classes": 2}).eval()
35
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
36
  logits = model(collated_batch)
37
  print(logits)
38
  print(torch.argmax(logits, dim=-1))
39
  ```
40
+
41
  #### Token Level Classification
42
  ```python
43
  import torch
44
  from genbio_finetune.tasks import TokenClassification
45
+ model = TokenClassification.from_config({"model.backbone": "aido_rna_1b600m_cds", "model.n_classes": 3}).eval()
46
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
47
  logits = model(collated_batch)
48
  print(logits)
49
  print(torch.argmax(logits, dim=-1))
50
  ```
51
+
 
 
 
 
 
 
 
52
  #### Or use our one-liner CLI to finetune or evaluate any of the above!
53
  ```
54
+ mgen fit --model SequenceClassification --model.backbone aido_rna_1b600m_cds --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
55
+ mgen test --model SequenceClassification --model.backbone aido_rna_1b600m_cds --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
56
  ```
57
+ For more information, visit: [ModelGenerator](https://github.com/genbio-ai/modelgenerator)
58
+
59
+
60
+ ## Reference
61
+ 1. Carlos Outeiral and Charlotte M Deane. Codon language embeddings provide strong signals for use in protein engineering. Nature Machine Intelligence, 6(2):170–179, 2024.