Mixture of Attention Heads: Selecting Attention Heads Per Token Paper • 2210.05144 • Published Oct 11, 2022 • 2