BlackMamba: Mixture of Experts for State-Space Models Paper • 2402.01771 • Published Feb 1 • 23
A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks Paper • 2212.00720 • Published Nov 16, 2022
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters Paper • 2408.04093 • Published Aug 7 • 4
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters Paper • 2408.04093 • Published Aug 7 • 4
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters Paper • 2408.04093 • Published Aug 7 • 4