Danish Text Datasets
These include high-quality Danish text datasets for pre-training, fine-tuning, etc.
Viewer • Updated • 3.46k • 169 • 3Note Usage: Sentiment Analysis Danish Tweets annotated for sentiment
DDSC/europarl
Viewer • Updated • 957 • 73 • 2Note Usage: Sentiment Analysis Samples from the European Parliament annotated for sentiment by the Alexandra Institute
DDSC/lcc
Viewer • Updated • 499 • 108 • 3Note Usage: Sentiment Analysis Samples from the Leipzig Corpus annotated for Sentiment by Finn Aarup Nielsen
strombergnlp/bajer_danish_misogyny
Viewer • Updated • 27.9k • 33Note Usage: Hate-speech and misogyny classification Social media posts annotated for misogyny
DDSC/dkhate
Viewer • Updated • 3.29k • 66 • 4Note Usage: Hate Speech Danish tweets annotated for hate-speech
alexandrainst/ddisco
Viewer • Updated • 1k • 37 • 1Note Usage: Discourse Coherence Samples from Wikipedia and Reddit annotated for discourse coherence
chcaa/dansk-ner
Viewer • Updated • 14.7k • 68 • 3Note Usage: NER Diverse domains sampled from the Danish Gigaword annotated for the 18 OntoNotes named entity classes
alexandrainst/dane
Updated • 111 • 3Note Usage: NER A version of DDT enriched with the 4 conllu-2003 named entity classes
KennethEnevoldsen/dane_plus
Viewer • Updated • 5.51k • 63 • 2Note Usage: NER A version of DaNE enriched by @KennethEnevoldsen to include the 18 OntoNotes named entity classes
alexandrainst/nordjylland-news-summarization
Viewer • Updated • 83.6k • 137 • 2Note Usage: Summarization News datasets from TV Nord
strombergnlp/dkstance
Updated • 55 • 2Note Usage: Stance Detection Reddit samples annotated for stance
strombergnlp/polstance
Updated • 70 • 1Note Usage: Political Stance Detection Statements annotated for political stance
strombergnlp/danfever
Updated • 105 • 4Note Usage: Fact Verification Fact verification dataset based on Wikipedia
danish-foundation-models/danish-gigaword
Viewer • Updated • 943k • 16 • 4Note Usage: Pre-training Danish datasets for pre-training large language models
DDSC/reddit-da
Viewer • Updated • 1.91M • 101 • 2Note Usage: Pre-training Danish samples from Reddit
alexandrainst/domsdatabasen
Viewer • Updated • 3.92k • 40 • 1Note Usage: Pre-training, anonymization Domsdatabasen is a database where you can find and read selected judgments delivered by the Danish Courts.