Just an EZ way to collect papers on HF
-
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 32 -
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation
Paper • 2503.04872 • Published • 14 -
FFN Fusion: Rethinking Sequential Computation in Large Language Models
Paper • 2503.18908 • Published • 17