Taming LLMs by Scaling Learning Rates with Gradient Grouping Paper • 2506.01049 • Published 7 days ago • 36
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published Apr 1 • 92
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training Paper • 2501.06842 • Published Jan 12 • 16