BETA Historical Swedish Donut

This model extends the base training of naver-clova-ix/donut-base with a "learn to read" training phase focused on historical handwritten Swedish. It has been trained on transcribing paragraphs of 1-15 lines of handwritten text sourced from documents from the period 1600-1900. The model needs to be finetuned for downstream use.

This model is still under development.

Known issues

The model has a tendency to produce empty transcriptions of shorter paragraphs (1-5 lines).

Training data

The training data was sourced from Riksarkivet's HTR training data (most of which can be found here on HuggingFace) and the Norhand v3 dataset.