🧠 SmolLM3 Collection Smol, multilingual, long-context reasoner • 7 items • Updated about 11 hours ago • 34
SmolLM3 evaluation datasets Collection Datasets to decontaminate the post-training mixtures against. Use the subset and column values described per entry • 13 items • Updated about 17 hours ago • 4
SmolLM3 pretraining datasets Collection datasets used in SmolLM3 pretraining • 14 items • Updated about 17 hours ago • 8
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • 1 day ago • 304
view article Article Can AI Be Consentful? Rethinking Permission in the Age of Synthetic Everything By giadap • about 21 hours ago • 5
Audio Language Model Collection Open source models including Malaysian context and dataset. • 23 items • Updated 11 days ago • 2
view article Article Groq on Hugging Face Inference Providers 🔥 By sbrandeis and 4 others • 23 days ago • 39
Common Pile v0.1 Raw Data Collection 8TB of public domain and openly licensed text • 30 items • Updated Jun 6 • 14
Common Pile v0.1 Filtered Data Collection An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated Jun 6 • 13
view article Article ScreenSuite - The most comprehensive evaluation suite for GUI Agents! Jun 6 • 49
view article Article Explore, Build, and Innovate AI Reasoning with NVIDIA’s Open Models and Recipes By nvidia and 2 others • Jun 4 • 21
view article Article Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H By Hcompany and 1 other • Jun 3 • 70
BioReason Collection BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model • 3 items • Updated 29 days ago • 14
ConTEB training datasets Collection Training data for the InSeNT method. • 3 items • Updated Jun 2 • 1
ConTEB evaluation datasets Collection Evaluation datasets of the ConTEB benchmark. Use "test" split where available, otherwise "validation", otherwise "train". • 8 items • Updated Jun 2 • 1
view article Article *Context Is Gold to Find the Gold Passage*: Evaluating and Training Contextual Document Embeddings By manu and 1 other • Jun 2 • 24