File size: 730 Bytes
b8a1cee |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
---
license: cc-by-nc-4.0
---
Mistral-based 500M decoder-only GFM trained on 50 genomes from 1000G data with sequence length of 4K nucleotides.
Useful baseline for the gfm-random-eval paper.
## Model Details
- **Model developers:** M42 Health AI Team
- **Base architecture:** [MistralForCausalLM](https://huggingface.co/docs/transformers/main/en/model_doc/mistral#transformers.MistralForCausalLM)
- **Context length:**
- **Training:** 4k tokens
- **Inference:** 4k tokens
- **Training data:** 1000 Genomes
- **Input format:** Raw DNA sequences
- **Output options:**
- DNA sequences only
- Embeddings
- **License:** CC BY-NC 4.0
- **Publication:** [paper link](https://www.biorxiv.org/content/10.1101/2024.12.18.628606v2) |