view article Article MIEB: The Benchmark That Stress-Tests Image-Text Embeddings Like Never Before By isaacchung and 2 others • Apr 24 • 14
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Paper • 2504.08942 • Published Apr 11 • 27
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Paper • 2504.08942 • Published Apr 11 • 27