arxiv:2505.05026

G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness

Published on May 8

· Submitted by

jeochris on May 12

Upvote

Authors:

Jaehyun Jeon ,

Jang Han Yoon ,

Sumin Shim ,

Yejin Choi ,

Abstract

G-FOCUS, a novel inference-time reasoning strategy, enhances Vision-Language Models for assessing UI persuasiveness, complementing A/B testing.

AI-generated summary

Evaluating user interface (UI) design effectiveness extends beyond aesthetics to influencing user behavior, a principle central to Design Persuasiveness. A/B testing is the predominant method for determining which UI variations drive higher user engagement, but it is costly and time-consuming. While recent Vision-Language Models (VLMs) can process automated UI analysis, current approaches focus on isolated design attributes rather than comparative persuasiveness-the key factor in optimizing user interactions. To address this, we introduce WiserUI-Bench, a benchmark designed for Pairwise UI Design Persuasiveness Assessment task, featuring 300 real-world UI image pairs labeled with A/B test results and expert rationales. Additionally, we propose G-FOCUS, a novel inference-time reasoning strategy that enhances VLM-based persuasiveness assessment by reducing position bias and improving evaluation accuracy. Experimental results show that G-FOCUS surpasses existing inference strategies in consistency and accuracy for pairwise UI evaluation. Through promoting VLM-driven evaluation of UI persuasiveness, our work offers an approach to complement A/B testing, propelling progress in scalable UI preference modeling and design optimization. Code and data will be released publicly.

View arXiv page View PDF Add to collection

Community

jeochris

Paper author Paper submitter 23 days ago

We introduce WiserUI-Bench, a benchmark with 300 real-world UI image pairs and A/B test results for assessing design persuasiveness. Our reasoning strategy, G-FOCUS, enhances VLMs' reliability in UI evaluation by reducing bias and improving accuracy.