---
license: apache-2.0
language:
- la
base_model:
- ClassCat/roberta-base-latin-v2
---

This model is fine tuned with The Latin Library - 15M Token

The dataset was cleaned:

- Removal of all "pseudo-Latin" text ("Lorem ipsum ...").
- Use of CLTK for sentence splitting and normalisation.
- deduplication of the corpus
- lowercase all text