BETA Historical Swedish Donut

This model extends the base training of naver-clova-ix/donut-base with a "learn to read" training phase focused on historical handwritten Swedish. It has been trained on transcribing paragraphs of 1-15 lines of handwritten text sourced from documents from the period 1600-1900. The model needs to be finetuned for downstream use.

This model is still under development.

Known issues

The model has a tendency to produce empty transcriptions of shorter paragraphs (1-5 lines).

Training data

The training data was sourced from Riksarkivet's HTR training data (most of which can be found here on HuggingFace) and the Norhand v3 dataset.

Downloads last month
189
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support