Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published Sep 6, 2024 • 49
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 141
VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments Paper • 2506.02387 • Published Jun 3 • 57
CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs Paper • 2505.24120 • Published May 30 • 49
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA Paper • 2505.21115 • Published May 27 • 137