Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction Paper • 2505.11254 • Published 8 days ago • 47
Model Merging in Pre-training of Large Language Models Paper • 2505.12082 • Published 7 days ago • 34
view article Article You could have designed state of the art positional encoding By FL33TW00D-HF • Nov 25, 2024 • 273