Interact with an agent to perform web-based tasks
Select elements in an image using text instructions