|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- HuggingFaceTB/smollm-corpus |
|
language: |
|
- en |
|
pipeline_tag: text2text-generation |
|
library_name: transformers |
|
--- |
|
|
|
|
|
# tFINE-900m-e16-d32-1024ctx |
|
|
|
|
|
Pretrained T5 model with nanoT5: |
|
|
|
- ~900m parameters, 16 layers in encoder, 32 layers in decoder |
|
- sentencepiece tokenizer with 48k vocab & byte-pair fallback |
|
- handles whitespaces etc correctly (unlike standard T5 tokenizer) |
|
- 1024 ctx during pretrain |
|
- `relative_attention_num_buckets` increased to 48 from standard 32 for context length upscaling |
|
|
|
## Experiment logs |
|
|
|
Training consisted of two phases: |
|
|
|
- [phase one](https://wandb.ai/pszemraj/nanoT5/runs/l0y9uuv3) - ~30k steps at context length 512 |
|
- [phase two](https://wandb.ai/pszemraj/nanoT5/runs/mao0tqjy) - 20k steps at context length 1024 |