Update README.md
Browse files
README.md
CHANGED
@@ -12,3 +12,18 @@ A monolingual Slovak language model.
|
|
12 |
|
13 |
Model was trained on a collection of Slovak web pages from various sources.
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
Model was trained on a collection of Slovak web pages from various sources.
|
14 |
|
15 |
+
## Training parameters
|
16 |
+
|
17 |
+
We used 4 x A100 40GB GPU for 14 hours.
|
18 |
+
|
19 |
+
- Effective batch size: 192
|
20 |
+
- Sequence length 512
|
21 |
+
- Training Steps 120 000.
|
22 |
+
- warmup_steps 1000
|
23 |
+
- optimizer adamw
|
24 |
+
- Per device batch size 48
|
25 |
+
- mixed_precision bf16
|
26 |
+
- weight decay 0.01
|
27 |
+
- gradient clipping 1.0
|
28 |
+
- learning_rate 1e-5
|
29 |
+
- scheduler cosine
|