German Wikipedia LMs

non-profit

Activity Feed Request to join this org

AI & ML interests

language modeling

Organization Card

Community About org cards

German Wikipedia LMs (GWLMs)

We present Language Models (BERT, BERT with Token Dropping, TEAMS, T5) pretrained on German Wikipedia.

This is an ongoing project!

German Wikipedia Corpus

We use a recent Wikipedia Dump, that can can be accessed here. Additionally, a sentence-segmented (using NLTK) is available here.

Fine-tuned Models

We fine-tuned NER models using SpanMarker library on GermEval 2014 NER dataset and upload the best models:

Acknowledgements

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️

models 13

gwlms/t5-efficient-small-dewiki-v1

0.1B • Updated Apr 19, 2024 • 1

gwlms/byt5-small-dewiki-v1

0.3B • Updated Apr 19, 2024 • 2

gwlms/t5-efficient-base-dewiki-v1

0.6B • Updated Apr 19, 2024 • 1

gwlms/span-marker-bert-germeval14

Token Classification • 0.1B • Updated Apr 19, 2024 • 5

gwlms/span-marker-token-dropping-bert-germeval14

Token Classification • 0.1B • Updated Apr 19, 2024 • 2

gwlms/span-marker-teams-germeval14

Token Classification • 0.1B • Updated Apr 19, 2024 • 1

gwlms/teams-base-dewiki-v1-discriminator

0.1B • Updated Apr 19, 2024 • 3

gwlms/bert-base-token-dropping-dewiki-v1

Fill-Mask • 0.1B • Updated Sep 6, 2023 • 1

gwlms/t5-efficient-large-dewiki-v1

1B • Updated Aug 7, 2023 • 1

gwlms/bert-base-dewiki-v1

Fill-Mask • 0.1B • Updated Aug 1, 2023

datasets 8

gwlms/dewiki-20230701-flair-corpus

Viewer • Updated Jun 10, 2024 • 45.6M • 123

gwlms/validation

Viewer • Updated Jan 5, 2024 • 15.6k • 19

gwlms/biofid

Updated Aug 23, 2023 • 9

gwlms/germeval2018

Updated Jul 26, 2023 • 29

gwlms/dewiki-20230701-chunks

Updated Jul 19, 2023 • 414

gwlms/dewiki-20230701-tfrecords-dupe5

Updated Jul 19, 2023 • 636

gwlms/dewiki-20230701-nltk-corpus

Viewer • Updated Jul 19, 2023 • 61.6M • 23

gwlms/dewiki-20230701

Viewer • Updated Jul 19, 2023 • 2.73M • 17