COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs Paper • 2502.17410 • Published Feb 24
LLMs Can Generate a Better Answer by Aggregating Their Own Responses Paper • 2503.04104 • Published Mar 6 • 1
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation Paper • 2506.18349 • Published 3 days ago • 8
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation Paper • 2506.18349 • Published 3 days ago • 8
LLMs Can Generate a Better Answer by Aggregating Their Own Responses Paper • 2503.04104 • Published Mar 6 • 1