genbio-ai
/

AIDO.RNA-1.6B-CDS

PyTorch

rnabert

Model card Files Files and versions Community

ShuxianZou commited on Dec 6, 2024

Commit

56462eb

verified ·

1 Parent(s): f15ce14

Update README.md

Browse files

Files changed (1) hide show

README.md +25 -17

README.md CHANGED Viewed

@@ -1,53 +1,61 @@
 ---
 base_model:
-- genbio-ai/rnafm-1.6b
 ---
-# AIDO.RNA 1.6B CDS
-AIDO.RNA 1.6B CDS is a domain adaptation model on the coding sequences. It was pre-trained on 9 million coding sequences starting with the AIDO.RNA 1.6B model.
 ## How to Use
 ### Build any downstream models from this backbone
 #### Embedding
 ```python
 from genbio_finetune.tasks import Embed
-model = Embed.from_config({"model.backbone": "rnafm_cds"}).eval()
 collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
 embedding = model(collated_batch)
 print(embedding.shape)
 print(embedding)
 ```
 #### Sequence Level Classification
 ```python
 import torch
 from genbio_finetune.tasks import SequenceClassification
-model = SequenceClassification.from_config({"model.backbone": "rnafm_cds", "model.n_classes": 2}).eval()
 collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
 logits = model(collated_batch)
 print(logits)
 print(torch.argmax(logits, dim=-1))
 ```
 #### Token Level Classification
 ```python
 import torch
 from genbio_finetune.tasks import TokenClassification
-model = TokenClassification.from_config({"model.backbone": "rnafm_cds", "model.n_classes": 3}).eval()
 collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
 logits = model(collated_batch)
 print(logits)
 print(torch.argmax(logits, dim=-1))
 ```
-#### Regression
-```python
-from genbio_finetune.tasks import SequenceRegression
-model = SequenceRegression.from_config({"model.backbone": "rnafm_cds"}).eval()
-collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
-logits = model(collated_batch)
-print(logits)
-```
 #### Or use our one-liner CLI to finetune or evaluate any of the above!
 ```
-gbft fit --model SequenceClassification --model.backbone rnafm_cds --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
-gbft test --model SequenceClassification --model.backbone rnafm_cds --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
 ```
-For more information, visit: [Model Generator](https://github.com/genbio-ai/modelgenerator)

 ---
 base_model:
+- genbio-ai/AIDO.RNA-1.6B
 ---
+# AIDO.RNA-1.6B-CDS
+AIDO.RNA-1.6B-CDS is a domain adaptation model on the coding sequences. It was pre-trained on 9 million coding sequences released by Carlos et al. (2024) [1] based on our [AIDO.RNA-1.6B](https://huggingface.co/genbio-ai/AIDO.RNA-1.6B) model.
 ## How to Use
 ### Build any downstream models from this backbone
 #### Embedding
 ```python
 from genbio_finetune.tasks import Embed
+model = Embed.from_config({"model.backbone": "aido_rna_1b600m_cds"}).eval()
 collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
 embedding = model(collated_batch)
 print(embedding.shape)
 print(embedding)
 ```
+#### Regression
+```python
+from genbio_finetune.tasks import SequenceRegression
+model = SequenceRegression.from_config({"model.backbone": "aido_rna_1b600m_cds"}).eval()
+collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
+logits = model(collated_batch)
+print(logits)
+```
 #### Sequence Level Classification
 ```python
 import torch
 from genbio_finetune.tasks import SequenceClassification
+model = SequenceClassification.from_config({"model.backbone": "aido_rna_1b600m_cds", "model.n_classes": 2}).eval()
 collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
 logits = model(collated_batch)
 print(logits)
 print(torch.argmax(logits, dim=-1))
 ```
 #### Token Level Classification
 ```python
 import torch
 from genbio_finetune.tasks import TokenClassification
+model = TokenClassification.from_config({"model.backbone": "aido_rna_1b600m_cds", "model.n_classes": 3}).eval()
 collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
 logits = model(collated_batch)
 print(logits)
 print(torch.argmax(logits, dim=-1))
 ```
 #### Or use our one-liner CLI to finetune or evaluate any of the above!
 ```
+mgen fit --model SequenceClassification --model.backbone aido_rna_1b600m_cds --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
+mgen test --model SequenceClassification --model.backbone aido_rna_1b600m_cds --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
 ```
+For more information, visit: [ModelGenerator](https://github.com/genbio-ai/modelgenerator)
+## Reference
+1. Carlos Outeiral and Charlotte M Deane. Codon language embeddings provide strong signals for use in protein engineering. Nature Machine Intelligence, 6(2):170–179, 2024.