pszemraj
/

tFINE-900m-e16-d32-1024ctx

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tFINE-900m-e16-d32-1024ctx / README.md

pszemraj's picture

Update README.md

6615769 verified 5 months ago

|

832 Bytes

	---
	license: apache-2.0
	datasets:
	- HuggingFaceTB/smollm-corpus
	language:
	- en
	pipeline_tag: text2text-generation
	library_name: transformers
	---


	# tFINE-900m-e16-d32-1024ctx


	Pretrained T5 model with [nanoT5](https://github.com/pszemraj/nanoT5/tree/fineweb-edu-test):

	- ~900m parameters, 16 layers in encoder, 32 layers in decoder
	- sentencepiece tokenizer with 48k vocab & byte-pair fallback
	- handles whitespaces etc correctly (_unlike original T5 tokenizer_)
	- 1024 ctx during pretrain
	- `relative_attention_num_buckets` increased to 48 from 32 for context length upscaling

	## Experiment logs

	Training consisted of two phases:

	- [phase one](https://wandb.ai/pszemraj/nanoT5/runs/l0y9uuv3) - ~30k steps at context length 512
	- [phase two](https://wandb.ai/pszemraj/nanoT5/runs/mao0tqjy) - 20k steps at context length 1024