CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives Paper • 2504.10823 • Published 4 days ago • 7
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? Paper • 2504.09702 • Published 6 days ago • 13
BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles Paper • 2109.11087 • Published Sep 23, 2021
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? Paper • 2504.09702 • Published 6 days ago • 13 • 2