deki-yolo: Mobile UI Element Detection Model

This is a YOLO model trained to identify common UI elements in mobile screenshots. It is the core detection model for the deki huggingface space or deki github

Model Description

The model is trained to detect the following four classes of UI elements:

View: General-purpose containers.
ImageView: Icons and images.
Text: Text elements.
Line: Separators and lines.

This model can be used as a foundational component for applications that need to understand screen layouts, such as AI agents for mobile automation, accessibility tools, and UI code generation.

YOLO examples

Bounding boxes with classes for bb_1:

Bounding boxes without classes but with IDs after NMS for bb_1:

Bounding boxes with classes for bb_2:

Bounding boxes without classes but with IDs after NMS for bb_2:

YOLO model accuracy

The model was trained on 486 images and was tested on 60 images.

Current YOLO model accuracy: