view article Article FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages By davanstrien and 5 others • about 18 hours ago • 19
view article Article Bringing Fusion Down to Earth: ML for Stellarator Optimization By cgeorgiaw • 7 days ago • 62
view article Article Teaching Data Literacy with Hugging Face's AI Sheets By ParulPandey • 9 days ago • 23
view article Article Gemma 3n fully available in the open-source ecosystem! By ariG23498 and 7 others • 13 days ago • 105
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published 13 days ago • 58
view article Article Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub By drbh and 6 others • 27 days ago • 108
view article Article Featherless AI on Hugging Face Inference Providers 🔥 By sbrandeis and 5 others • 27 days ago • 43
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5 • 42
view changelog Changelog Xet is now the default storage option for new users and organizations May 23 • 66
view article Article Interactive Tools for machine learning, deep learning, and math By Suzana • May 26 • 44
view article Article Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs By davidberenstein1957 and 1 other • May 7 • 38
view article Article I Clicked “I Agree”, But What Am I Really Consenting To? By giadap • Mar 26 • 24
view article Article Open R1: How to use OlympicCoder locally for coding? By burtenshaw and 4 others • Mar 20 • 62
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM By ariG23498 and 3 others • Mar 12 • 440