Historic Language Modeling
This collection contains models, datasets and spaces related to historic language models
Fill-Mask • Updated • 74 • 6Note A multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT).
dbmdz/bert-base-historic-multilingual-64k-td-cased
Fill-Mask • Updated • 83 • 1Note A historic multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT) with 64k vocab size.
Riksarkivet/bert-base-cased-swe-historical
Fill-Mask • Updated • 70 • 2Note A historical Swedish Bert model is released from the National Swedish Archives to better generalise to Swedish historical text.
dell-research-harvard/AmericanStories
Updated • 1.87k • 131Note The American Stories dataset is a collection of full article texts extracted from historical U.S. newspaper images. It includes nearly 20 million scans from the public domain Chronicling America collection maintained by the Library of Congress.
dell-research-harvard/headlines-semantic-similarity
Viewer • Updated • 34.9M • 1.1k • 11Note HEADLINES is a massive English-language semantic similarity dataset, containing 396,001,930 pairs of different headlines for the same newspaper article, taken from historical U.S. newspapers, covering the period 1920-1989.
dbmdz/bert-medium-historic-multilingual-cased
Fill-Mask • Updated • 4.16kNote A medium-sized 8-layer historic multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT).
dbmdz/bert-small-historic-multilingual-cased
Fill-Mask • Updated • 29 • 1Note A small-sized 4-layer historic multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT).
dbmdz/bert-mini-historic-multilingual-cased
Fill-Mask • Updated • 1.1k • 2Note A mini-sized 4-layer historic multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT).
dbmdz/bert-tiny-historic-multilingual-cased
Fill-Mask • Updated • 187 • 1Note A tiny 2-layer historic multilingual (German, English, French, Swedish, and Finnish) BERT model (hmBERT).
dbmdz/bert-base-historic-english-cased
Fill-Mask • Updated • 23 • 2Note A BERT model pretrained on historic English text.
hmbyt5/byt5-small-english
Text2Text Generation • Updated • 20Note A ByT5 token-free model pretrained on historic English books.
dbmdz/bert-base-finnish-europeana-cased
Fill-Mask • Updated • 19Note A BERT model pretrained on historic Finnish newspapers.
dbmdz/bert-base-german-europeana-cased
Updated • 708Note A BERT model pretrained on historic German newspapers.
dbmdz/bert-base-german-europeana-uncased
Updated • 38 • 1Note A BERT model pretrained on historic German newspapers with uncased vocabulary.
dbmdz/electra-base-german-europeana-cased-discriminator
Updated • 18Note An ELECTRA model pretrained on historic German newspapers.
dbmdz/electra-base-german-europeana-cased-generator
Fill-Mask • Updated • 11Note An ELECTRA model pretrained on historic German newspapers.
dbmdz/convbert-base-german-europeana-cased
Feature Extraction • Updated • 79 • 2Note A ConvBERT model pretrained on historic German newspapers.
dbmdz/distilbert-base-german-europeana-cased
Updated • 285 • 8Note A DistilBERT model pretrained on historic German newspapers.
dbmdz/bert-base-french-europeana-cased
Updated • 39k • 3Note A BERT model pretrained on historic French newspapers.
dbmdz/electra-base-french-europeana-cased-discriminator
Updated • 14 • 1Note An ELECTRA model pretrained on historic French newspapers.
dbmdz/electra-base-french-europeana-cased-generator
Fill-Mask • Updated • 13Note An ELECTRA model pretrained on historic French newspapers.
hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax
Text2Text Generation • Updated • 22Note A historic multilingual (German, English, French, Swedish, Finnish, Dutch and Norwegian) ByT5 token-free model pretrained on various corpora.
hmteams/teams-base-historic-multilingual-generator
Fill-Mask • Updated • 27 • 1Note A historic multilingual (German, English, French, Swedish, Finnish, Dutch and Norwegian) TEAMS model pretrained on various corpora.
hmteams/teams-base-historic-multilingual-discriminator
Updated • 11Note A historic multilingual (German, English, French, Swedish, Finnish, Dutch and Norwegian) TEAMS model pretrained on various corpora.
- Runtime error8📚
British Library Books Genre Classifier V2
Pclanglais/Larth-Mistral
Text Generation • Updated • 20 • 5
Livingwithmachines/erwt-year-masked-25
Fill-Mask • Updated • 11Note Experimental DistilBERT model trained on British newspaper data and metadata. Can be used for predicting the year of a text snippet or adjust the prediction (of masked tokens) to the date of publication of a text.
Livingwithmachines/erwt-year-masked-75
Fill-Mask • Updated • 13Note Experimental DistilBERT model trained on British newspaper data and metadata. Can be used for predicting the year of a text snippet or adjust the prediction (of masked tokens) to the date of publication of a text.
Livingwithmachines/erwt-year
Fill-Mask • Updated • 34Note Experimental DistilBERT model trained on British newspaper data and metadata. Can be used for predicting the year of a text snippet or adjust the prediction (of masked tokens) to the date of publication of a text.
Livingwithmachines/bert_1760_1900
Fill-Mask • Updated • 311Note BERT model fine-tuned on a collection of (mostly) nineteenth century books published between 1760 and 1900.
Livingwithmachines/toponym-19thC-en
Token Classification • Updated • 608 • 2Note Model fine-tuned on 19th century model and trained to detect toponyms (place names) in the historical collections.
Livingwithmachines/bert_1890_1900
Fill-Mask • Updated • 405Note BERT model fine-tuned on a collection of nineteenth century books published between 1890 and 1900.
Livingwithmachines/bert_1760_1850
Fill-Mask • Updated • 18 • 1Note BERT model fine-tuned on a collection of (mostly) nineteenth century books published between 1760 and 1850.
Livingwithmachines/bert_1850_1875
Fill-Mask • Updated • 10Note BERT model fine-tuned on a collection of (mostly) nineteenth century books published between 1850 and 1875.
Livingwithmachines/bert_1875_1890
Fill-Mask • Updated • 13Note BERT model fine-tuned on a collection of (mostly) nineteenth century books published between 1875 and 1890.
emanjavacas/GysBERT
Updated • 365 • 4emanjavacas/MacBERTh
Updated • 87 • 6