Papers
arxiv:2508.01858

Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents

Published on Aug 3
ยท Submitted by Gnonymous on Aug 7
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

A framework for web agents decomposes their capabilities into knowledge content learning and cognitive processes, using a structured dataset and a novel reasoning framework to enhance generalization and performance.

AI-generated summary

Multimodal large-scale models have significantly advanced the development of web agents, enabling perception and interaction with digital environments akin to human cognition. In this paper, we argue that web agents must first acquire sufficient knowledge to effectively engage in cognitive reasoning. Therefore, we decompose a web agent's capabilities into two essential stages: knowledge content learning and cognitive processes. To formalize this, we propose Web-CogKnowledge Framework, categorizing knowledge as Factual, Conceptual, and Procedural. In this framework, knowledge content learning corresponds to the agent's processes of Memorizing and Understanding, which rely on the first two knowledge types, representing the "what" of learning. Conversely, cognitive processes correspond to Exploring, grounded in Procedural knowledge, defining the "how" of reasoning and action. To facilitate knowledge acquisition, we construct the Web-CogDataset, a structured resource curated from 14 real-world websites, designed to systematically instill core knowledge necessary for web agent. This dataset serves as the agent's conceptual grounding-the "nouns" upon which comprehension is built-as well as the basis for learning how to reason and act. Building on this foundation, we operationalize these processes through a novel knowledge-driven Chain-of-Thought (CoT) reasoning framework, developing and training our proposed agent, the Web-CogReasoner. Extensive experimentation reveals its significant superiority over existing models, especially in generalizing to unseen tasks where structured knowledge is decisive. To enable rigorous evaluation, we introduce the Web-CogBench, a comprehensive evaluation suite designed to assess and compare agent performance across the delineated knowledge domains and cognitive capabilities. Our code and data is open sourced at https://github.com/Gnonymous/Web-CogReasoner

Community

Paper author Paper submitter
โ€ข
edited 1 day ago

Web-CogReasoner

   ๐Ÿ“‘ arXiv    |    ๐Ÿ Code    |    ๐Ÿค— Models    |    ๐Ÿค— Dataset   

   ๐ŸŒ Homepage    |    ๐Ÿ’ฌ Blog   

Web-CogReasoner introduces a paradigm shift from simply enhancing web agents to systematically building their cognitive abilities from the ground up. Inspired by Bloomโ€™s Taxonomy, we decomposes agent capabilities into knowledge content learning (Factual, Conceptual) and cognitive processes (Procedural), enabling interpretable and goal-directed behavior. Built upon large multimodal models, it performs knowledge-driven Chain-of-Thought (CoT) reasoning across complex web tasks, where each reasoning step is transparently grounded in a specific knowledge type, ensuring both interpretability and robust generalization.

To support this, we introduce:

  • Web-CogKnowledge Framework: A Bloom's Taxonomy-inspired two-stage training paradigm (Knowledge Content Learning โ†’ Cognitive Reasoning) for enhancing web agents' cognitive abilities.

  • Web-CogReasoner: A knowledge-driven multimodal agent trained via imitation learning in our Web-CogDataset.

  • Web-CogDataset: A curriculum-style dataset with 12 fine-grained tasks across 3 knowledge levels (Factual, Conceptual, Procedural), enabling stepwise skill acquisition.

  • Web-CogBench: A dedicated benchmark for evaluating whether a web agent possesses the requisite prior knowledge and cognitive capabilities for effective web navigation.

Related Links

๐Ÿ“‘ arXiv: https://arxiv.org/abs/2508.01858
๐Ÿ Code: https://github.com/Gnonymous/Web-CogReasoner
๐Ÿค— Models: https://huggingface.co/Gnonymous/Web-CogReasoner
๐Ÿค— Dataset: https://huggingface.co/datasets/Gnonymous/Web-CogDataset
๐Ÿ’ฌ Blogs: https://Gnonymous.github.io/blogs/Web-CogReasoner
๐ŸŒ Homepage: https://Gnonymous.github.io/Web-CogReasoner

Citation

@ article{guo2025web,
  title={Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents},
  author={Guo, Yuhan and Guo, Cong and Sun, Aiwen and He, Hongliang and Yang, Xinyu and Lu, Yue and Zhang, Yingji and Guo, Xuntao and Zhang, Dong and Liu, Jianzhuang and others},
  journal={arXiv preprint arXiv:2508.01858},
  year={2025}
}

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.01858 in a Space README.md to link it from this page.

Collections including this paper 3