Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States Paper • 2505.17663 • Published May 23 • 15
LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling Paper • 2505.19187 • Published May 25 • 13
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25, 2024 • 64
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark Paper • 2401.11944 • Published Jan 22, 2024 • 28