RoFormer: Enhanced Transformer with Rotary Position Embedding Paper • 2104.09864 • Published Apr 20, 2021 • 13
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4, 2024 • 62
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM Paper • 2401.02994 • Published Jan 4, 2024 • 51
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published Jun 6, 2024 • 64
Extreme Compression of Large Language Models via Additive Quantization Paper • 2401.06118 • Published Jan 11, 2024 • 13
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 120
HyperZcdotZcdotW Operator Connects Slow-Fast Networks for Full Context Interaction Paper • 2401.17948 • Published Jan 31, 2024 • 4
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper • 2405.20233 • Published May 30, 2024 • 6
Stream of Search (SoS): Learning to Search in Language Paper • 2404.03683 • Published Apr 1, 2024 • 32
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 391
Preference Leakage: A Contamination Problem in LLM-as-a-judge Paper • 2502.01534 • Published Feb 3 • 40
Levels of AGI: Operationalizing Progress on the Path to AGI Paper • 2311.02462 • Published Nov 4, 2023 • 37
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12 • 72
Large Language Model Agent: A Survey on Methodology, Applications and Challenges Paper • 2503.21460 • Published Mar 27 • 77
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Paper • 2505.01658 • Published 22 days ago • 35
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Paper • 2505.09343 • Published 11 days ago • 59
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen! Paper • 2505.15656 • Published 3 days ago • 13