biomed-multi-omic
Biology
RNA
thrumbel commited on
Commit
8fd392f
·
verified ·
1 Parent(s): 88e8f4d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: biomed-multi-omic
3
+ license: apache-2.0
4
+ tags:
5
+ - Biology
6
+ - RNA
7
+ datasets:
8
+ - PanglaoDB
9
+ - CELLxGENE
10
+ ---
11
+
12
+ # ibm-research/biomed.rna.bert.110m.wced.v1
13
+
14
+ Biomedical foundational models for omics data. This package supports the development of foundation models for scRNA or for DNA data.
15
+
16
+ `biomed-multi-omic` enables development and testing of foundation models for DNA sequences and for RNA expression,
17
+ with modular model and training methods for pretraining and fine-tuning, controllable via a declarative no-code interface.
18
+ `biomed-multi-omic` leverages anndata, HuggingFace Transformers, PyTorchLighting and Hydra.
19
+
20
+ - 🧬 A single package for DNA and RNA Foundation models. scRNA pretraining on h5ad files or TileDB (eg CellXGene), DNA pretraining on reference human genome (GRCh38/hg38) and also variant imputed genome based on common SNPs available from GWAT catalog and ClinVar datasets.
21
+ - 🚀 Leverages latest open source tools: anndata, HuggingFace transformers and PyTorchLighting
22
+ - 📈 Zero-shot and finetuning support for diverse downstream tasks: (cell type annotation, perturbation prediction for scRNA, promoter prediction task and regulatory regions using Massively parallel reporter assays (MPRAs)
23
+ for DNA sequences)
24
+ - Novel pretraining strategies for scRNA and DNA implemented alongside existing methods to enable experimentation and comparison.
25
+
26
+ For details on how the models were trained, please refer to [the BMFM-RNA preprint](https://arxiv.org/abs/2506.14861).
27
+
28
+ - **Developers:** IBM Research
29
+ - **GitHub Repository:** [https://github.com/BiomedSciAI/biomed-multi-omic](https://github.com/BiomedSciAI/biomed-multi-omic)
30
+ - **Paper:** [BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models](https://arxiv.org/abs/2506.14861)
31
+ - **Release Date**: Jun 17th, 2025
32
+ - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
33
+
34
+ ## Checkpoint
35
+
36
+ Whole-cell Expression Decoder (WCED): Using the BMFM-RNA framework, we implemented a new pretraining objective that is centered around predicting the expression levels for the whole cell at once, rather than limiting to just the masked
37
+ genes.
38
+
39
+ **WCED 10 pct:** Trained using WCED with random gene order and log-normalization.
40
+
41
+ See section 2.3.4 of [the BMFM-RNA manuscript](https://arxiv.org/abs/2506.14861) for more details.
42
+
43
+ ## Usage
44
+
45
+ Using `biomed.rna.bert.110m.wced.v1` requires the codebase [https://github.com/BiomedSciAI/biomed-multi-omic](https://github.com/BiomedSciAI/biomed-multi-omic)
46
+
47
+ For installation, please follow the [instructions on github](https://github.com/BiomedSciAI/biomed-multi-omic?tab=readme-ov-file#installation).
48
+
49
+ ## RNA Inference
50
+
51
+ To get embeddings and predictions for scRNA data run:
52
+
53
+ ```bash
54
+ export MY_DATA_FILE=... # path to h5ad file with raw counts and gene symbols
55
+ bmfm-targets-run -cn predict input_file=$MY_DATA_FILE working_dir=/tmp checkpoint=ibm-research/biomed.rna.bert.110m.wced.v1
56
+ ```
57
+
58
+ For more details see the [RNA tutorials on github](https://github.com/BiomedSciAI/biomed-multi-omic/tree/main/tutorials/RNA).
59
+
60
+ ## Citation
61
+
62
+ ```bibtex
63
+ @misc{dandala2025bmfmrnaopenframeworkbuilding,
64
+ title={BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models},
65
+ author={Bharath Dandala and Michael M. Danziger and Ella Barkan and Tanwi Biswas and Viatcheslav Gurev and Jianying Hu and Matthew Madgwick and Akira Koseki and Tal Kozlovski and Michal Rosen-Zvi and Yishai Shimoni and Ching-Huei Tsou},
66
+ year={2025},
67
+ eprint={2506.14861},
68
+ archivePrefix={arXiv},
69
+ primaryClass={q-bio.GN},
70
+ url={https://arxiv.org/abs/2506.14861},
71
+ }
72
+ ```