view article Article Performant local mixture-of-experts CPU inference with GPU acceleration in llama.cpp 15 days ago • 10
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published Oct 21, 2025 • 114
Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective Paper • 2509.22921 • Published Sep 26, 2025 • 12