pszemraj commited on
Commit
4f4a121
·
verified ·
1 Parent(s): 4d8a2ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -5
README.md CHANGED
@@ -11,10 +11,13 @@ tags:
11
  - t5
12
  ---
13
 
 
14
 
15
- - 1024 ctx
16
- - SiLU activations
17
- - `fineweb-edu-dedup` split of `HuggingFaceTB/smollm-corpus`
 
 
18
 
19
  ## details
20
 
@@ -40,8 +43,8 @@ tags:
40
  - Gradient accumulation steps: 24
41
  - Final cosine learning rate: 1e-5
42
 
43
- 4. Hardware utilization:
44
- - Device: GPU
45
  - Precision: bfloat16, tf32
46
 
47
  ## plots
 
11
  - t5
12
  ---
13
 
14
+ # tFINE-base-300m
15
 
16
+ An encoder-decoder (T5 architecture) pretrained with [nanoT5](https://github.com/pszemraj/nanoT5/tree/flan-dataset):
17
+
18
+ - tokenizer: custom llama2 with 48k vocab (from [vocab scaling laws](https://hf.co/collections/sail/scaling-laws-with-vocabulary-6699e0cbd77a8b2870859bfe))
19
+ - data: `fineweb-edu-dedup` split of `HuggingFaceTB/smollm-corpus`
20
+ - context length: 1024 ctx
21
 
22
  ## details
23
 
 
43
  - Gradient accumulation steps: 24
44
  - Final cosine learning rate: 1e-5
45
 
46
+ 4. Hardware:
47
+ - Device: RTX 4080
48
  - Precision: bfloat16, tf32
49
 
50
  ## plots