LaTCoder: Converting Webpage Design to Code with Layout-as-Thought
Abstract
LaTCoder enhances layout preservation in design-to-code tasks by dividing webpage designs into blocks and using Chain-of-Thought reasoning with MLLMs, achieving significant improvements in metrics and human preference.
Converting webpage designs into code (design-to-code) plays a vital role in User Interface (UI) development for front-end developers, bridging the gap between visual design and functional implementation. While recent Multimodal Large Language Models (MLLMs) have shown significant potential in design-to-code tasks, they often fail to accurately preserve the layout during code generation. To this end, we draw inspiration from the Chain-of-Thought (CoT) reasoning in human cognition and propose LaTCoder, a novel approach that enhances layout preservation in webpage design during code generation with Layout-as-Thought (LaT). Specifically, we first introduce a simple yet efficient algorithm to divide the webpage design into image blocks. Next, we prompt MLLMs using a CoTbased approach to generate code for each block. Finally, we apply two assembly strategies-absolute positioning and an MLLM-based method-followed by dynamic selection to determine the optimal output. We evaluate the effectiveness of LaTCoder using multiple backbone MLLMs (i.e., DeepSeek-VL2, Gemini, and GPT-4o) on both a public benchmark and a newly introduced, more challenging benchmark (CC-HARD) that features complex layouts. The experimental results on automatic metrics demonstrate significant improvements. Specifically, TreeBLEU scores increased by 66.67% and MAE decreased by 38% when using DeepSeek-VL2, compared to direct prompting. Moreover, the human preference evaluation results indicate that annotators favor the webpages generated by LaTCoder in over 60% of cases, providing strong evidence of the effectiveness of our method.
Community
Layout-as-Thought: A Paradigm for Layout-Preserverd UI Code Generation
We propose Layout-as-Thought, a simple yet effective paradigm designed to alleviate the layout reasoning limitations of multimodal large language models (MLLMs) in design-to-code generation.
Rather than generating entire webpages in one pass, Layout-as-Thought decomposes the design into blocks and prompts MLLMs to reason and generate code step by step, leading to better structural fidelity and visual alignment. We instantiate this idea through LaTCoder, and evaluate it on CC-HARD, a newly introduced benchmark featuring complex real-world webpage layouts.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MLLM-Based UI2Code Automation Guided by UI Layout Information (2025)
- ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents (2025)
- Improved Iterative Refinement for Chart-to-Code Generation via Structured Instruction (2025)
- LOCOFY Large Design Models -- Design to code conversion solution (2025)
- Multilingual Multimodal Software Developer for Code Generation (2025)
- DesignCoder: Hierarchy-Aware and Self-Correcting UI Code Generation with Large Language Models (2025)
- ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper