Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
HuggingFaceTB
's Collections
π§ SmolLM3
SmolLM3 pretraining datasets
SmolLM3 evaluation datasets
Dolma LongAttn Graded
Reasoning datasets
SmolLM2
SmolVLM2 πΊ Smallest video LM ever π€π»
π LLM pretraining datasets
SmolVLM
π§© SmolLM2 Intermediate Checkpoints
The Ultimate Collection of Code Classifiers
SmolVLM 256M & 500M
π FineMath
π» Local SmolLMs
πͺ SmolLM
Instruct datasets
π Cosmopedia
Find textbooks in FineWeb with a classifier
FineWeb clustering & synthetic generations
Other: Stanford, OpenStax, khanAcademy, wikihow...
FW generation prompts
Wikipedia Science topics
Wikipedia textbooks
SFT Experiments
Decay mixture experiments
models
π Cosmopedia
updated
May 5
Resources for Cosmopedia dataset
Upvote
9
HuggingFaceTB/cosmopedia
Viewer
β’
Updated
Aug 12, 2024
β’
31.1M
β’
4.83k
β’
632
HuggingFaceTB/cosmo-1b
Text Generation
β’
2B
β’
Updated
Jul 8, 2024
β’
836
β’
132
Running
6
6
Web clusters
πΈ
Browse and explore clustered web samples by educational value
HuggingFaceTB/cosmopedia-100k
Viewer
β’
Updated
Feb 19, 2024
β’
100k
β’
304
β’
45
HuggingFaceTB/cosmopedia-meta
Viewer
β’
Updated
Feb 20, 2024
β’
31.1M
β’
38
β’
2
HuggingFaceTB/smollm-corpus
Viewer
β’
Updated
Sep 6, 2024
β’
237M
β’
23.9k
β’
352
Upvote
9
+5
Share collection
View history
Collection guide
Browse collections