--- base_model: - genbio-ai/AIDO.RNA-1.6B --- # AIDO.RNA-1.6B-CDS AIDO.RNA-1.6B-CDS is a domain adaptation model on the coding sequences. It was pre-trained on 9 million coding sequences released by Carlos et al. (2024) [1] based on our [AIDO.RNA-1.6B](https://huggingface.co/genbio-ai/AIDO.RNA-1.6B) model. ## How to Use ### Build any downstream models from this backbone #### Embedding ```python from genbio_finetune.tasks import Embed model = Embed.from_config({"model.backbone": "aido_rna_1b600m_cds"}).eval() collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]}) embedding = model(collated_batch) print(embedding.shape) print(embedding) ``` #### Regression ```python from genbio_finetune.tasks import SequenceRegression model = SequenceRegression.from_config({"model.backbone": "aido_rna_1b600m_cds"}).eval() collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]}) logits = model(collated_batch) print(logits) ``` #### Sequence Level Classification ```python import torch from genbio_finetune.tasks import SequenceClassification model = SequenceClassification.from_config({"model.backbone": "aido_rna_1b600m_cds", "model.n_classes": 2}).eval() collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]}) logits = model(collated_batch) print(logits) print(torch.argmax(logits, dim=-1)) ``` #### Token Level Classification ```python import torch from genbio_finetune.tasks import TokenClassification model = TokenClassification.from_config({"model.backbone": "aido_rna_1b600m_cds", "model.n_classes": 3}).eval() collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]}) logits = model(collated_batch) print(logits) print(torch.argmax(logits, dim=-1)) ``` #### Or use our one-liner CLI to finetune or evaluate any of the above! ``` mgen fit --model SequenceClassification --model.backbone aido_rna_1b600m_cds --data SequenceClassification --data.path mgen test --model SequenceClassification --model.backbone aido_rna_1b600m_cds --data SequenceClassification --data.path ``` For more information, visit: [ModelGenerator](https://github.com/genbio-ai/modelgenerator) ## Reference 1. Carlos Outeiral and Charlotte M Deane. Codon language embeddings provide strong signals for use in protein engineering. Nature Machine Intelligence, 6(2):170–179, 2024.