File size: 730 Bytes
b8a1cee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
license: cc-by-nc-4.0
---

Mistral-based 500M decoder-only GFM trained on 50 genomes from 1000G data with sequence length of 4K nucleotides.
Useful baseline for the gfm-random-eval paper.

## Model Details
- **Model developers:** M42 Health AI Team
- **Base architecture:** [MistralForCausalLM](https://huggingface.co/docs/transformers/main/en/model_doc/mistral#transformers.MistralForCausalLM)
- **Context length:**
  - **Training:** 4k tokens
  - **Inference:** 4k tokens
- **Training data:** 1000 Genomes
- **Input format:** Raw DNA sequences
- **Output options:**
  - DNA sequences only
  - Embeddings
- **License:** CC BY-NC 4.0
- **Publication:** [paper link](https://www.biorxiv.org/content/10.1101/2024.12.18.628606v2)