Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21, 2024 • 32
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation Paper • 2503.04872 • Published 28 days ago • 14
FFN Fusion: Rethinking Sequential Computation in Large Language Models Paper • 2503.18908 • Published 10 days ago • 17