pszemraj
/

tFINE-900m-e16-d32-1024ctx

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tFINE-900m-e16-d32-1024ctx / README.md

pszemraj's picture

Update README.md

db5a103 verified 5 months ago

|

779 Bytes

	---
	license: apache-2.0
	datasets:
	- HuggingFaceTB/smollm-corpus
	language:
	- en
	pipeline_tag: text2text-generation
	library_name: transformers
	---


	# tFINE-900m-e16-d32-1024ctx


	Pretrained T5 model with nanoT5:

	- ~900m parameters, 16 layers in encoder, 32 layers in decoder
	- sentencepiece tokenizer with 48k vocab & byte-pair fallback
	- handles whitespaces etc correctly (unlike standard T5 tokenizer)
	- 1024 ctx during pretrain
	- `relative_attention_num_buckets` increased to 48 from standard 32 for context length upscaling

	## Experiment logs

	Training consisted of two phases:

	- [phase one](https://wandb.ai/pszemraj/nanoT5/runs/l0y9uuv3) - ~30k steps at context length 512
	- [phase two](https://wandb.ai/pszemraj/nanoT5/runs/mao0tqjy) - 20k steps at context length 1024