Mistral-based 500M decoder-only GFM trained on 50 genomes from 1000G data with sequence length of 4K nucleotides. Useful baseline for the gfm-random-eval paper.
Model Details
- Model developers: M42 Health AI Team
- Base architecture: MistralForCausalLM
- Context length:
- Training: 4k tokens
- Inference: 4k tokens
- Training data: 1000 Genomes
- Input format: Raw DNA sequences
- Output options:
- DNA sequences only
- Embeddings
- License: CC BY-NC 4.0
- Publication: paper link
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support