--- license: apache-2.0 language: - la base_model: - ClassCat/roberta-base-latin-v2 --- This model is fine tuned with The Latin Library - 15M Token The dataset was cleaned: - Removal of all "pseudo-Latin" text ("Lorem ipsum ..."). - Use of CLTK for sentence splitting and normalisation. - deduplication of the corpus - lowercase all text