Text Generation
Transformers
Safetensors
English
llama
text-generation-inference

tinyllama-proteinpretrain-quinoa

Full model finetuning of TinyLLaMA-1.1B on the "research" split (quinoa protein sequences) of GreenBeing-Proteins dataset.

Notes: pretraining only on sequences leads the model to only generate protein sequences, eventually repeating VVVV ot KKKK.

  • This model may be replaced with mixed training (bio/chem text and protein).
  • This model might need "biotokens" to represent the amino acids instead of using the existing tokenizer.

More details TBD

Downloads last month
11
Safetensors
Model size
1.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for monsoon-nlp/tinyllama-proteinpretrain-quinoa

Finetuned
(96)
this model

Datasets used to train monsoon-nlp/tinyllama-proteinpretrain-quinoa