probablybots commited on
Commit
c6fc380
·
verified ·
1 Parent(s): 83349cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -20
README.md CHANGED
@@ -1,49 +1,63 @@
1
- # AIDO.RNA 1M
 
 
 
 
2
 
3
- AIDO.RNA 1M is a 1 million parameter RNA foundation model pre-trained on 886 million RNA sequences from the MARS database.
4
 
5
  ## How to Use
6
- ### Build any downstream models from this backbone
 
 
 
 
 
 
 
7
  #### Embedding
8
  ```python
9
- from genbio_finetune.tasks import Embed
10
- model = Embed.from_config({"model.backbone": "rnafm_1m"}).eval()
11
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
12
  embedding = model(collated_batch)
13
  print(embedding.shape)
14
  print(embedding)
15
  ```
16
- #### Sequence Level Classification
17
  ```python
18
  import torch
19
- from genbio_finetune.tasks import SequenceClassification
20
- model = SequenceClassification.from_config({"model.backbone": "rnafm_1m", "model.n_classes": 2}).eval()
21
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
22
  logits = model(collated_batch)
23
  print(logits)
24
  print(torch.argmax(logits, dim=-1))
25
  ```
26
- #### Token Level Classification
27
  ```python
28
  import torch
29
- from genbio_finetune.tasks import TokenClassification
30
- model = TokenClassification.from_config({"model.backbone": "rnafm_1m", "model.n_classes": 3}).eval()
31
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
32
  logits = model(collated_batch)
33
  print(logits)
34
  print(torch.argmax(logits, dim=-1))
35
  ```
36
- #### Regression
37
  ```python
38
- from genbio_finetune.tasks import SequenceRegression
39
- model = SequenceRegression.from_config({"model.backbone": "rnafm_1m"}).eval()
40
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
41
  logits = model(collated_batch)
42
  print(logits)
43
  ```
44
- #### Or use our one-liner CLI to finetune or evaluate any of the above!
45
- ```
46
- gbft fit --model SequenceClassification --model.backbone rnafm_1m --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
47
- gbft test --model SequenceClassification --model.backbone rnafm_1m --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
48
- ```
49
- For more information, visit: [Model Generator](https://github.com/genbio-ai/modelgenerator)
 
 
 
 
 
1
+ # AIDO.RNA-1M-MARS
2
+
3
+ AIDO.RNA-1M-MARS is a 1 million parameter RNA foundation model pre-trained on 886 million RNA sequences from the MARS database.
4
+
5
+ For a more detailed description, refer to the SOTA model in this collection https://huggingface.co/genbio-ai/AIDO.RNA-1.6B
6
 
 
7
 
8
  ## How to Use
9
+ ### Build any downstream models from this backbone with ModelGenerator
10
+ For more information, visit: [Model Generator](https://github.com/genbio-ai/modelgenerator)
11
+ ```bash
12
+ mgen fit --model SequenceClassification --model.backbone aido_rna_1m_mars --data SequenceClassificationDataModule --data.path <hf_or_local_path_to_your_dataset>
13
+ mgen test --model SequenceClassification --model.backbone aido_rna_1m_mars --data SequenceClassificationDataModule --data.path <hf_or_local_path_to_your_dataset>
14
+ ```
15
+
16
+ ### Or use directly in Python
17
  #### Embedding
18
  ```python
19
+ from modelgenerator.tasks import Embed
20
+ model = Embed.from_config({"model.backbone": "aido_rna_1m_mars"}).eval()
21
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
22
  embedding = model(collated_batch)
23
  print(embedding.shape)
24
  print(embedding)
25
  ```
26
+ #### Sequence-level Classification
27
  ```python
28
  import torch
29
+ from modelgenerator.tasks import SequenceClassification
30
+ model = SequenceClassification.from_config({"model.backbone": "aido_rna_1m_mars", "model.n_classes": 2}).eval()
31
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
32
  logits = model(collated_batch)
33
  print(logits)
34
  print(torch.argmax(logits, dim=-1))
35
  ```
36
+ #### Token-level Classification
37
  ```python
38
  import torch
39
+ from modelgenerator.tasks import TokenClassification
40
+ model = TokenClassification.from_config({"model.backbone": "aido_rna_1m_mars", "model.n_classes": 3}).eval()
41
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
42
  logits = model(collated_batch)
43
  print(logits)
44
  print(torch.argmax(logits, dim=-1))
45
  ```
46
+ #### Sequence-level Regression
47
  ```python
48
+ from modelgenerator.tasks import SequenceRegression
49
+ model = SequenceRegression.from_config({"model.backbone": "aido_rna_1m_mars"}).eval()
50
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
51
  logits = model(collated_batch)
52
  print(logits)
53
  ```
54
+
55
+ ### Get RNA sequence embedding
56
+ ```python
57
+ from genbio_finetune.tasks import Embed
58
+ model = Embed.from_config({"model.backbone": "aido_rna_1m_mars"}).eval()
59
+ collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
60
+ embedding = model(collated_batch)
61
+ print(embedding.shape)
62
+ print(embedding)
63
+ ```