Dynamic Chunking for End-to-End Hierarchical Sequence Modeling Paper • 2507.07955 • Published 7 days ago • 15 • 3
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published 15 days ago • 52 • 11
DarwinLM: Evolutionary Structured Pruning of Large Language Models Paper • 2502.07780 • Published Feb 11 • 18 • 7