pszemraj
/

tFINE-base-300m

Text2Text Generation

text-generation-inference

Model card Files Files and versions

pszemraj commited on Aug 11, 2024

Commit

0364a6e

·

verified ·

1 Parent(s): fd92cbf

Update README.md

Files changed (1) hide show

README.md +29 -0

README.md CHANGED Viewed

@@ -16,6 +16,35 @@ tags:
 - SiLU activations
 - `fineweb-edu-dedup` split of `HuggingFaceTB/smollm-corpus`
 ## plots

 - SiLU activations
 - `fineweb-edu-dedup` split of `HuggingFaceTB/smollm-corpus`
+## details
+1. Model:
+   - Dropout rate: 0.0
+   - Activations: `silu`, `gated-silu`
+   - Model compilation: enabled
+2. Data processing:
+   - Input length: 1024
+   - MLM probability: 0.15
+3. Optimization:
+   - Optimizer: AdamW with scaling
+   - Base learning rate: 0.008
+   - Batch size: 120
+   - Total training steps: 80,000
+   - Warmup steps: 10,000
+   - Learning rate scheduler: Cosine
+   - Weight decay: 0.0001
+   - Gradient clipping: 1.0
+   - Gradient accumulation steps: 24
+   - Final cosine learning rate: 1e-5
+4. Hardware utilization:
+   - Device: GPU
+   - Precision: bfloat16, tf32
 ## plots