Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published 5 days ago • 30
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training Paper • 2501.18965 • Published 5 days ago • 5
Trading Inference-Time Compute for Adversarial Robustness Paper • 2501.18841 • Published 5 days ago • 3
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference Paper • 2502.00299 • Published 4 days ago • 1
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 5 days ago • 46
Large Language Models Think Too Fast To Explore Effectively Paper • 2501.18009 • Published 6 days ago • 22
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 7 days ago • 99
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published 8 days ago • 21
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models Paper • 2502.01142 • Published 2 days ago • 10
PaSa: An LLM Agent for Comprehensive Academic Paper Search Paper • 2501.10120 • Published 19 days ago • 42
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 22 days ago • 272