Recent advancements in Large Language Models (LLMs) and multimodal counterparts have led to increased interest in developing AI systems known as web agents, capable of autonomously navigating web environments.
Challenges arise due to the mismatch between human-designed interfaces and LLM capabilities, hampering current approaches in automating complex web interactions.
Existing methods struggle with processing web inputs, such as massive DOM trees, augmented screenshots, or API interactions bypassing user interfaces.
The position paper proposes a paradigm shift in web agent research, advocating for the development of an Agentic Web Interface (AWI) optimized for AI systems.
Six guiding principles for AWI design are introduced, focusing on safety, efficiency, and standardization to cater to all primary stakeholders.
The goal is to create more effective, reliable, and transparent web agent designs by reframing interactions for agents, rather than adapting agents to human-designed interfaces.
The collaborative effort involves the broader machine learning community to overcome limitations of existing web interfaces.