The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 87
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 87
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 87
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Paper • 2405.18392 • Published May 28 • 12
A Dataset and Strong Baselines for Classification of Czech News Texts Paper • 2307.10666 • Published Jul 20, 2023
Evaluating the Search Phase of Neural Architecture Search Paper • 1902.08142 • Published Feb 21, 2019
Landmark Attention: Random-Access Infinite Context Length for Transformers Paper • 2305.16300 • Published May 25, 2023
Faster Causal Attention Over Large Sequences Through Sparse Flash Attention Paper • 2306.01160 • Published Jun 1, 2023 • 1
OctoPack: Instruction Tuning Code Large Language Models Paper • 2308.07124 • Published Aug 14, 2023 • 28
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only Paper • 2306.01116 • Published Jun 1, 2023 • 32
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements Paper • 2210.01970 • Published Sep 30, 2022 • 11