Occiglot5
Occiglot5 is a modern T5 model for German with 1.42B parameters and the following features:
- Pretrained on the German Occiglot FineWeb corpus (except deWaC and Open Legal Data) and on the 10BT subsets of FineWeb and FineWeb-Edu
- UL2 is used as pretraining objective
- Efficient T5 architecture from the "Scale Efficiently" paper is used
- Pretrained for 5M steps using a batch size of 128 and an input/output sequence length of 512
- One-shot training on a v4-32 TPU Pod for 22.3 days without any crashes
Acknowledgments
Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs over many years ❤️
Made from Bavarian Oberland with ❤️ and 🥨.
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.