Papers
arxiv:2505.04110

Alpha Excel Benchmark

Published on May 7
· Submitted by dnoever on May 7
Authors:
,

Abstract

This study presents a novel benchmark for evaluating Large Language Models (LLMs) using challenges derived from the Financial Modeling World Cup (FMWC) Excel competitions. We introduce a methodology for converting 113 existing FMWC challenges into programmatically evaluable JSON formats and use this dataset to compare the performance of several leading LLMs. Our findings demonstrate significant variations in performance across different challenge categories, with models showing specific strengths in pattern recognition tasks but struggling with complex numerical reasoning. The benchmark provides a standardized framework for assessing LLM capabilities in realistic business-oriented tasks rather than abstract academic problems. This research contributes to the growing field of AI benchmarking by establishing proficiency among the 1.5 billion people who daily use Microsoft Excel as a meaningful evaluation metric that bridges the gap between academic AI benchmarks and practical business applications.

Community

Paper submitter

LLM models compete in Excel World Championships

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.04110 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.04110 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.04110 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.