The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 87
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 87
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper • 2405.18392 • Published May 28 • 12
Power Hungry Processing: Watts Driving the Cost of AI Deployment? Paper • 2311.16863 • Published Nov 28, 2023 • 6
What's in the Box? A Preliminary Analysis of Undesirable Content in the Common Crawl Corpus Paper • 2105.02732 • Published May 6, 2021
OctoPack: Instruction Tuning Code Large Language Models Paper • 2308.07124 • Published Aug 14, 2023 • 28
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements Paper • 2210.01970 • Published Sep 30, 2022 • 11
Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness Paper • 2302.10893 • Published Feb 7, 2023 • 6
Evaluating the Social Impact of Generative AI Systems in Systems and Society Paper • 2306.05949 • Published Jun 9, 2023 • 9
Evaluating the Social Impact of Generative AI Systems in Systems and Society Paper • 2306.05949 • Published Jun 9, 2023 • 9
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus Paper • 2104.08758 • Published Apr 18, 2021
Quantifying the Carbon Emissions of Machine Learning Paper • 1910.09700 • Published Oct 21, 2019 • 11
ClimateGAN: Raising Climate Change Awareness by Generating Images of Floods Paper • 2110.02871 • Published Oct 6, 2021
SEAL : Interactive Tool for Systematic Error Analysis and Labeling Paper • 2210.05839 • Published Oct 11, 2022
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements Paper • 2210.01970 • Published Sep 30, 2022 • 11