AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Paper β’ 2504.08942 β’ Published 13 days ago β’ 27