DABstep Reasoning Benchmark Leaderboard
Ranking of LLMs for agentic tasks
Tracks perf of LLMs, VLMs and agents on web navigation tasks
A leaderboard for LLMs powering smolagents