Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
HuggingFaceFW 's Collections
πŸ₯‚ FineWeb2
🍷 FineWeb datasets
πŸ“š FineWeb-Edu
πŸ“€ Dataset comparison models
πŸ§ͺ FineWeb v1 data experiments

🍷 FineWeb datasets

updated Jun 26, 2024
Upvote
24

  • Running
    939
    939

    FineWeb: decanting the web for the finest text data at scale

    🍷

    Generate high-quality web text data for LLM training


  • HuggingFaceFW/fineweb

    Viewer β€’ Updated Jan 31 β€’ 25B β€’ 864k β€’ 2.15k

  • HuggingFaceFW/fineweb-edu

    Viewer β€’ Updated Jan 31 β€’ 3.3B β€’ 137k β€’ 675

  • HuggingFaceFW/fineweb-edu-score-2

    Viewer β€’ Updated Apr 11 β€’ 13.1B β€’ 31.5k β€’ 73

  • The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

    Paper β€’ 2406.17557 β€’ Published Jun 25, 2024 β€’ 97
Upvote
24
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs