Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published 16 days ago • 26 • 5
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published 16 days ago • 26 • 5
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published 16 days ago • 26 • 5
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding Paper • 2406.09297 • Published Jun 13, 2024 • 6 • 2