Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time Paper • 2310.17157 • Published Oct 26, 2023 • 11
Universal Language Model Fine-tuning for Text Classification Paper • 1801.06146 • Published Jan 18, 2018 • 6
Effective Approaches to Attention-based Neural Machine Translation Paper • 1508.04025 • Published Aug 17, 2015 • 2
Neural Machine Translation by Jointly Learning to Align and Translate Paper • 1409.0473 • Published Sep 1, 2014 • 4
RoFormer: Enhanced Transformer with Rotary Position Embedding Paper • 2104.09864 • Published Apr 20, 2021 • 10
Ring Attention with Blockwise Transformers for Near-Infinite Context Paper • 2310.01889 • Published Oct 3, 2023 • 10