benchmark - a MisakiWang Collection

MisakiWang 's Collections

IAI

Model

Align

Agent

benchmark

updated 24 days ago

OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

Paper • 2402.17553 • Published Feb 27 • 21
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs

Paper • 2410.04698 • Published Oct 7 • 13