ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper • 2505.19897 • Published 13 days ago • 101
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models Paper • 2505.02847 • Published May 1 • 27
DeepMath Collection A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning • 5 items • Updated 16 days ago • 1
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks Paper • 2504.15521 • Published Apr 22 • 64
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Paper • 2504.11456 • Published Apr 15 • 13
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs Paper • 2412.21187 • Published Dec 30, 2024 • 42
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12, 2024 • 238