|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- HuggingFaceTB/smollm-corpus |
|
language: |
|
- en |
|
pipeline_tag: text2text-generation |
|
library_name: transformers |
|
--- |
|
|
|
|
|
# tFINE-900m-e16-d32-1024ctx |
|
|
|
|
|
Pretrained T5 model with [nanoT5](https://github.com/pszemraj/nanoT5/tree/fineweb-edu-test): |
|
|
|
- ~900m parameters, 16 layers in encoder, 32 layers in decoder |
|
- sentencepiece tokenizer with 48k vocab & byte-pair fallback |
|
- handles whitespaces etc correctly (_unlike original T5 tokenizer_) |
|
- 1024 ctx during pretrain |
|
- `relative_attention_num_buckets` increased to 48 from 32 for context length upscaling |
|
|
|
## Experiment logs |
|
|
|
Training consisted of two phases: |
|
|
|
- [phase one](https://wandb.ai/pszemraj/nanoT5/runs/l0y9uuv3) - ~30k steps at context length 512 |
|
- [phase two](https://wandb.ai/pszemraj/nanoT5/runs/mao0tqjy) - 20k steps at context length 1024 |