pszemraj commited on
Commit
0364a6e
·
verified ·
1 Parent(s): fd92cbf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -16,6 +16,35 @@ tags:
16
  - SiLU activations
17
  - `fineweb-edu-dedup` split of `HuggingFaceTB/smollm-corpus`
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ## plots
20
 
21
 
 
16
  - SiLU activations
17
  - `fineweb-edu-dedup` split of `HuggingFaceTB/smollm-corpus`
18
 
19
+ ## details
20
+
21
+
22
+ 1. Model:
23
+ - Dropout rate: 0.0
24
+ - Activations: `silu`, `gated-silu`
25
+ - Model compilation: enabled
26
+
27
+ 2. Data processing:
28
+ - Input length: 1024
29
+ - MLM probability: 0.15
30
+
31
+ 3. Optimization:
32
+ - Optimizer: AdamW with scaling
33
+ - Base learning rate: 0.008
34
+ - Batch size: 120
35
+ - Total training steps: 80,000
36
+ - Warmup steps: 10,000
37
+ - Learning rate scheduler: Cosine
38
+ - Weight decay: 0.0001
39
+ - Gradient clipping: 1.0
40
+ - Gradient accumulation steps: 24
41
+ - Final cosine learning rate: 1e-5
42
+
43
+ 4. Hardware utilization:
44
+ - Device: GPU
45
+ - Precision: bfloat16, tf32
46
+
47
+
48
  ## plots
49
 
50