Hierarchical Autoregressive Transformers: Combining Byte-~and Word-Level Processing for Robust, Adaptable Language Models Paper • 2501.10322 • Published 15 days ago • 1 • 2