monsoon-nlp
/

tinyllama-proteinpretrain-quinoa

Text Generation

text-generation-inference

Model card Files Files and versions Community

tinyllama-proteinpretrain-quinoa

Full model finetuning of TinyLLaMA-1.1B on the "research" split (quinoa protein sequences) of GreenBeing-Proteins dataset.

Notes: pretraining only on sequences leads the model to only generate protein sequences, eventually repeating VVVV ot KKKK.

This model may be replaced with mixed training (bio/chem text and protein).
This model might need "biotokens" to represent the amino acids instead of using the existing tokenizer.

More details TBD

Downloads last month: 11

Safetensors

Model size

1.1B params

Tensor type

F32

·

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for monsoon-nlp/tinyllama-proteinpretrain-quinoa

Base model

TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T

Finetuned

(96)

this model

Datasets used to train monsoon-nlp/tinyllama-proteinpretrain-quinoa