--- license: apache-2.0 datasets: - HuggingFaceTB/smollm-corpus language: - en pipeline_tag: text2text-generation library_name: transformers --- # tFINE-850m-24x24-1024ctx Pretrained T5 model with nanoT5: - ~850m parameters, 24 layers in encoder, 24 layers in decoder - sentencepiece tokenizer with 48k vocab & byte-pair fallback - handles whitespaces etc correctly (unlike standard T5 tokenizer) - 1024 ctx during pretrain - `relative_attention_num_buckets` increased to 48 from standard 32 for context length upscaling ## Experiment logs Training consisted of two phases: - TODO - TODO