The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics Paper • 2102.01672 • Published Feb 2, 2021
Datasets: A Community Library for Natural Language Processing Paper • 2109.02846 • Published Sep 7, 2021 • 14
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources Paper • 2201.10066 • Published Jan 25, 2022
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset Paper • 2303.03915 • Published Mar 7, 2023 • 7