Update README.md
Browse files
README.md
CHANGED
@@ -11,10 +11,13 @@ tags:
|
|
11 |
- t5
|
12 |
---
|
13 |
|
|
|
14 |
|
15 |
-
-
|
16 |
-
|
17 |
-
-
|
|
|
|
|
18 |
|
19 |
## details
|
20 |
|
@@ -40,8 +43,8 @@ tags:
|
|
40 |
- Gradient accumulation steps: 24
|
41 |
- Final cosine learning rate: 1e-5
|
42 |
|
43 |
-
4. Hardware
|
44 |
-
- Device:
|
45 |
- Precision: bfloat16, tf32
|
46 |
|
47 |
## plots
|
|
|
11 |
- t5
|
12 |
---
|
13 |
|
14 |
+
# tFINE-base-300m
|
15 |
|
16 |
+
An encoder-decoder (T5 architecture) pretrained with [nanoT5](https://github.com/pszemraj/nanoT5/tree/flan-dataset):
|
17 |
+
|
18 |
+
- tokenizer: custom llama2 with 48k vocab (from [vocab scaling laws](https://hf.co/collections/sail/scaling-laws-with-vocabulary-6699e0cbd77a8b2870859bfe))
|
19 |
+
- data: `fineweb-edu-dedup` split of `HuggingFaceTB/smollm-corpus`
|
20 |
+
- context length: 1024 ctx
|
21 |
|
22 |
## details
|
23 |
|
|
|
43 |
- Gradient accumulation steps: 24
|
44 |
- Final cosine learning rate: 1e-5
|
45 |
|
46 |
+
4. Hardware:
|
47 |
+
- Device: RTX 4080
|
48 |
- Precision: bfloat16, tf32
|
49 |
|
50 |
## plots
|