Hymba: A Hybrid-head Architecture for Small Language Models Paper β’ 2411.13676 β’ Published Nov 20, 2024 β’ 45 β’ 3
PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation Paper β’ 2410.01680 β’ Published Oct 2, 2024 β’ 36 β’ 4
LLM Pruning and Distillation in Practice: The Minitron Approach Paper β’ 2408.11796 β’ Published Aug 21, 2024 β’ 59 β’ 4