ShuxianZou's picture
Update README.md
8319cd5 verified
metadata
datasets:
  - genbio-ai/transcript_isoform_expression_prediction
base_model:
  - genbio-ai/AIDO.RNA-1.6B-CDS
  - EleutherAI/enformer-official-rough
  - facebook/esm2_t30_150M_UR50D
metrics:
  - spearmanr
  - r_squared
tags:
  - biology

Tri-modal model for RNA isoform expression prediction

RNA isoform expression prediction

  • Input: dna_seq, rna_seq, protein_seq
  • Output: expression level in 30 tissues

Model architecture

description

  • Backbones:
    • DNA: Enformer (fully finetuning)
    • RNA: AIDO.RNA-1.6B-CDS (lora finetuning)
    • Protein: ESM2-150M (lora finetuning)
  • Fusion method: concat fusion

Usage

Download model

from huggingface_hub import snapshot_download
from pathlib import Path

model_name = "genbio-ai/AIDO.MM-Enformer-RNA-1.6B-CDS-ESM2-150M-ConcatFusion-rna-isoform-expression-ckpt"
genbio_models_path = Path.home().joinpath('genbio_models', model_name)
genbio_models_path.mkdir(parents=True, exist_ok=True)
snapshot_download(repo_id=model_name, local_dir=genbio_models_path)

Evaluation script

Once you download the model, you can use the model in ModelGenertor using the following script:

CONFIG_FILE=...     # put the config file path here
CKPT_PATH=...       # put the model checkpoint path here

mgen test --config $CONFIG_FILE \
    --data.batch_size 16 \
    --trainer.logger null \
    --model.strict_loading False \
    --model.reset_optimizer_states True \
    --ckpt_path $CKPT_PATH