Historic Language Modeling

biglam 's Collections

Historic Newsaper Datasets

updated Oct 31, 2023

This collection contains models, datasets and spaces related to historic language models

Upvote

dbmdz/bert-base-historic-multilingual-cased

Fill-Mask • 0.1B • Updated Sep 6, 2023 • 251 • • 8

Note A multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT).
dbmdz/bert-base-historic-multilingual-64k-td-cased

Fill-Mask • 0.1B • Updated Sep 6, 2023 • 42 • 1

Note A historic multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT) with 64k vocab size.
Riksarkivet/bert-base-cased-swe-historical

Fill-Mask • 0.1B • Updated Oct 11, 2023 • 41 • 3

Note A historical Swedish Bert model is released from the National Swedish Archives to better generalise to Swedish historical text.
dell-research-harvard/AmericanStories

Updated Mar 26 • 9.47k • 148

Note The American Stories dataset is a collection of full article texts extracted from historical U.S. newspaper images. It includes nearly 20 million scans from the public domain Chronicling America collection maintained by the Library of Congress.
dell-research-harvard/headlines-semantic-similarity

Viewer • Updated Jun 7, 2024 • 34.9M • 1.42k • 13

Note HEADLINES is a massive English-language semantic similarity dataset, containing 396,001,930 pairs of different headlines for the same newspaper article, taken from historical U.S. newspapers, covering the period 1920-1989.
dbmdz/bert-medium-historic-multilingual-cased

Fill-Mask • 0.0B • Updated Sep 6, 2023 • 84

Note A medium-sized 8-layer historic multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT).
dbmdz/bert-small-historic-multilingual-cased

Fill-Mask • 0.0B • Updated Sep 6, 2023 • 256 • 1

Note A small-sized 4-layer historic multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT).
dbmdz/bert-mini-historic-multilingual-cased

Fill-Mask • 0.0B • Updated Sep 6, 2023 • 1.04k • 3

Note A mini-sized 4-layer historic multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT).
dbmdz/bert-tiny-historic-multilingual-cased

Fill-Mask • 0.0B • Updated Sep 6, 2023 • 196 • 1

Note A tiny 2-layer historic multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT).
dbmdz/bert-base-historic-english-cased

Fill-Mask • 0.1B • Updated Feb 8, 2024 • 51 • 1

Note A BERT model pretrained on historic English text.
hmbyt5/byt5-small-english

Text2Text Generation • 0.3B • Updated Oct 28, 2024 • 31 • 1

Note A ByT5 token-free model pretrained on historic English books.
dbmdz/bert-base-finnish-europeana-cased

Fill-Mask • 0.1B • Updated Nov 13, 2024 • 63

Note A BERT model pretrained on historic Finnish newspapers.
dbmdz/bert-base-german-europeana-cased

0.1B • Updated Oct 28, 2024 • 3.48k • 4

Note A BERT model pretrained on historic German newspapers.
dbmdz/bert-base-german-europeana-uncased

0.1B • Updated Dec 12, 2024 • 60 • 5

Note A BERT model pretrained on historic German newspapers with uncased vocabulary.
dbmdz/electra-base-german-europeana-cased-discriminator

0.1B • Updated Mar 4 • 48

Note An ELECTRA model pretrained on historic German newspapers.
dbmdz/electra-base-german-europeana-cased-generator

Fill-Mask • 0.0B • Updated Sep 6, 2023 • 35

Note An ELECTRA model pretrained on historic German newspapers.
dbmdz/convbert-base-german-europeana-cased

Feature Extraction • 0.1B • Updated Sep 18, 2023 • 108 • 2

Note A ConvBERT model pretrained on historic German newspapers.
dbmdz/distilbert-base-german-europeana-cased

0.1B • Updated Oct 28, 2024 • 47 • 8

Note A DistilBERT model pretrained on historic German newspapers.
dbmdz/bert-base-french-europeana-cased

Updated Sep 13, 2021 • 45.2k • 4

Note A BERT model pretrained on historic French newspapers.
dbmdz/electra-base-french-europeana-cased-discriminator

0.1B • Updated Feb 24 • 48 • 1

Note An ELECTRA model pretrained on historic French newspapers.
dbmdz/electra-base-french-europeana-cased-generator

Fill-Mask • 0.0B • Updated Sep 6, 2023 • 2.92k

Note An ELECTRA model pretrained on historic French newspapers.
hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax

Text2Text Generation • Updated Oct 27, 2023 • 20

Note A historic multilingual (German, English, French, Swedish, Finnish, Dutch and Norwegian) ByT5 token-free model pretrained on various corpora.
hmteams/teams-base-historic-multilingual-generator

Fill-Mask • 0.1B • Updated Feb 3 • 40 • 1

Note A historic multilingual (German, English, French, Swedish, Finnish, Dutch and Norwegian) TEAMS model pretrained on various corpora.
hmteams/teams-base-historic-multilingual-discriminator

0.1B • Updated Feb 3 • 27

Note A historic multilingual (German, English, French, Swedish, Finnish, Dutch and Norwegian) TEAMS model pretrained on various corpora.
Runtime error

8

8

British Library Books Genre Classifier V2

📚
Pclanglais/Larth-Mistral

Text Generation • Updated Oct 21, 2023 • 25 • 5
Livingwithmachines/erwt-year-masked-25

Fill-Mask • Updated Nov 21, 2022 • 19

Note Experimental DistilBERT model trained on British newspaper data and metadata. Can be used for predicting the year of a text snippet or adjust the prediction (of masked tokens) to the date of publication of a text.
Livingwithmachines/erwt-year-masked-75

Fill-Mask • Updated Nov 21, 2022 • 28

Note Experimental DistilBERT model trained on British newspaper data and metadata. Can be used for predicting the year of a text snippet or adjust the prediction (of masked tokens) to the date of publication of a text.
Livingwithmachines/erwt-year

Fill-Mask • Updated Nov 24, 2022 • 39

Note Experimental DistilBERT model trained on British newspaper data and metadata. Can be used for predicting the year of a text snippet or adjust the prediction (of masked tokens) to the date of publication of a text.
Livingwithmachines/bert_1760_1900

Fill-Mask • Updated Jul 18, 2022 • 40

Note BERT model fine-tuned on a collection of (mostly) nineteenth century books published between 1760 and 1900.
Livingwithmachines/toponym-19thC-en

Token Classification • Updated Jul 18, 2023 • 22 • 2

Note Model fine-tuned on 19th century model and trained to detect toponyms (place names) in the historical collections.
Livingwithmachines/bert_1890_1900

Fill-Mask • 0.1B • Updated Mar 22, 2023 • 1.12k

Note BERT model fine-tuned on a collection of nineteenth century books published between 1890 and 1900.
Livingwithmachines/bert_1760_1850

Fill-Mask • Updated Jul 18, 2022 • 35 • 1

Note BERT model fine-tuned on a collection of (mostly) nineteenth century books published between 1760 and 1850.
Livingwithmachines/bert_1850_1875

Fill-Mask • Updated Jul 18, 2022 • 28

Note BERT model fine-tuned on a collection of (mostly) nineteenth century books published between 1850 and 1875.
Livingwithmachines/bert_1875_1890

Fill-Mask • Updated Jul 18, 2022 • 33

Note BERT model fine-tuned on a collection of (mostly) nineteenth century books published between 1875 and 1890.
emanjavacas/GysBERT

Updated Nov 28, 2023 • 89 • 5
emanjavacas/MacBERTh

Updated Nov 28, 2023 • 4.82k • 7

Upvote

Historic Language Modeling

British Library Books Genre Classifier V2