readme: add more clarifications about German FineWeb dataset, used for pretraining
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ language:
|
|
15 |
|
16 |
Occiglot5 is a modern [T5](https://arxiv.org/abs/1910.10683) model for German with 1.42B parameters and the following features:
|
17 |
|
18 |
-
* Pretrained on the German Occiglot FineWeb corpus and on the 10BT subsets of FineWeb and FineWeb-Edu
|
19 |
* [UL2](https://arxiv.org/abs/2205.05131) is used as pretraining objective
|
20 |
* Efficient T5 architecture from the ["Scale Efficiently"](https://arxiv.org/abs/2109.10686) paper is used
|
21 |
* Pretrained for 5M steps using a batch size of 128 and an input/output sequence length of 512
|
|
|
15 |
|
16 |
Occiglot5 is a modern [T5](https://arxiv.org/abs/1910.10683) model for German with 1.42B parameters and the following features:
|
17 |
|
18 |
+
* Pretrained on the German Occiglot FineWeb corpus (except deWaC and Open Legal Data) and on the 10BT subsets of FineWeb and FineWeb-Edu
|
19 |
* [UL2](https://arxiv.org/abs/2205.05131) is used as pretraining objective
|
20 |
* Efficient T5 architecture from the ["Scale Efficiently"](https://arxiv.org/abs/2109.10686) paper is used
|
21 |
* Pretrained for 5M steps using a batch size of 128 and an input/output sequence length of 512
|