VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks Paper • 2410.19100 • Published Oct 24, 2024 • 6
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published Dec 18, 2024 • 50
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published Dec 18, 2024 • 50
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks Paper • 2410.19100 • Published Oct 24, 2024 • 6
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks Paper • 2410.19100 • Published Oct 24, 2024 • 6 • 2
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks Paper • 2401.13649 • Published Jan 24, 2024 • 1
ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights Paper • 2406.14596 • Published Jun 20, 2024 • 5
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published Sep 12, 2024 • 44
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published Sep 12, 2024 • 44