Hugging Face community’s Wikimedia datasets
Wikimedia datasets created by the Hugging Face community, not Wikimedia. Sorted by Wikimedia project.
Preview • Updated • 107 • 3Note Wikipedia: Wikipedia communities' article for deletion discussions in English, Turkish, and German, with a focus on stance and policy prediction.
santhosh/day_in_history
Viewer • Updated • 21.5k • 59 • 1Note Wikipedia: Historic events are mapped against each date with reference if available, from Wikipedia pages.
Salesforce/wikitext
Viewer • Updated • 3.71M • 349k • 376Note Wikipedia: Dataset based on good and featured articles on Wikipedia.
wikimedia/wit_base
Viewer • Updated • 108k • 1.38k • 59Note Wikipedia: A image-text dataset based on Wikipedia articles and the associated Wikipedia images.
aiintelligentsystems/vel_commons_wikidata
Viewer • Updated • 772k • 111 • 1Note Wikimedia Commons: Dataset leveraging the structured data from Wikidata that the community adds to describe images for the task of visual entity linking, including license information.
MLCommons/speech-wikimedia
Viewer • Updated • 16 • 539 • 11Note Wikimedia Commons: A dataset of Wikimedia Commons audio files and transcriptions across languages.
calm-and-collected/knives_and_time
Viewer • Updated • 325 • 53Note Wikimedia Commons: Dataset of public domain images, manually collected for damaged images, paintings, and photographs (collected from Wikimedia Commons)[https://huggingface.co/calm-and-collected/knives_and_time]
kdm-daiict/freebase-wikidata-mapping
Viewer • Updated • 2.08M • 36 • 4Note Wikidata: Links Wikidata to the widely used, but outdated Freebase knowledge Graph.
rvashurin/wikidata_simplequestions
Updated • 23 • 2Note Wikidata: The simple questions dataset, based on Wikidata, on Hugging Face.
rayliuca/WikidataLabels
Viewer • Updated • 654M • 10.2k • 1Note Wikidata: Dataset of entity labels across languages, extracted from Wikidata.
imvladikon/paranames
Viewer • Updated • 78M • 62 • 1Note Wikidata: Dataset of 118 million names across 400 languages.
mostol/wiktionary-ipa
Viewer • Updated • 80.1k • 25 • 5Note Wiktionary: IPA strings and their respective pronunciation as audio files.
malteos/wikinews
Viewer • Updated • 249k • 1.62k • 2Note Wikinews: Dataset of Wikinews articles without Wikitext across different languages, including the revision timestamp, categories, and sources.
taln-ls2n/wikinews-fr-100
Updated • 75 • 1Note Wikinews: Dataset for keyphrase extraction and generation models, including 100 French Wikinews articles.
erhwenkuo/wikinews-zhtw
Viewer • Updated • 9.83k • 115 • 3Note Wikinews: Dataset of cleaned Chinese Wikinews articles from 2023.
caretech-owl/wikiquote-de-quotes
Viewer • Updated • 16.2k • 81Note Wikiquote: Dataset of German quotes and their authors from Wikiquote.
domenicrosati/TruthfulQA
Viewer • Updated • 817 • 117 • 7Note Wikiquote: Dataset testing humans’ false believes’ representation in language models. One of the sources is Wikiquote.