The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper β’ 2406.17557 β’ Published Jun 25 β’ 87