Slovak Roberta Base

A monolingual Slovak language model.

Model was trained on a collection of Slovak web pages from various sources.

Training parameters

We used 4 x A100 40GB GPU for 14 hours.

  • Effective batch size: 192
  • Sequence length 512
  • Training Steps 120 000.
  • warmup_steps 1000
  • optimizer adamw
  • Per device batch size 48
  • mixed_precision bf16
  • weight decay 0.01
  • gradient clipping 1.0
  • learning_rate 1e-5
  • scheduler cosine
Downloads last month
275
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TUKE-KEMT/slovak-roberta-base

Finetuned
(1495)
this model