SOAP: Improving and Stabilizing Shampoo using Adam Paper β’ 2409.11321 β’ Published Sep 17, 2024 β’ 1
Small Models Struggle to Learn from Strong Reasoners Paper β’ 2502.12143 β’ Published 20 days ago β’ 28
Granite Data Collection This collection has a set of artifacts which are related to curating and evaluating datasets used for Granite models β’ 16 items β’ Updated 9 days ago β’ 4
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita π₯ 20 days ago β’ 93
view article Article From Llasa to Llasagna π: Finetuning LLaSA to generates Italian speech and other languages By Steveeeeeeen and 1 other β’ 27 days ago β’ 26
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper β’ 2502.02737 β’ Published Feb 4 β’ 199
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training Paper β’ 2501.18965 β’ Published Jan 31 β’ 7
view article Article Mini-R1: Reproduce Deepseek R1 βaha momentβ a RL tutorial By open-r1 β’ Jan 31 β’ 42
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other β’ Jan 23 β’ 64
view article Article How biased is Whisper ? Evaluating Whisper Models for Robustness to Diverse English Accents By Steveeeeeeen β’ Jan 29 β’ 16