When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs Paper • 2508.11383 • Published 23 days ago • 39
Hierarchical Autoregressive Transformers: Combining Byte-~and Word-Level Processing for Robust, Adaptable Language Models Paper • 2501.10322 • Published Jan 17 • 2
Should We Still Pretrain Encoders with Masked Language Modeling? Paper • 2507.00994 • Published Jul 1 • 78
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published Jul 1 • 76