menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Windows Ag...
source image

Marktechpost

1M

read

339

img
dot

Windows Agent Arena (WAA): A Scalable Open-Sourced Windows AI Agent Platform for Testing and Benchmarking Multi-modal, Desktop AI Agent

  • Researchers from Microsoft, Carnegie Mellon University, and Columbia University introduced the WindowsAgentArena capable of testing and benchmarking multi-modal, desktop AI agents on the Windows OS environment.
  • The platform can parallelize evaluation, conducting a complete benchmark run in 20 minutes, resulting in more realistic agent behavior.
  • Windows Agent Arena can seamlessly integrate with Docker containers, providing a secure environment for testing to scale evaluations across multiple agents.
  • WindowsAgentArena offers a comprehensive and reproducible benchmark specifically designed with over 154 diverse tasks to mimic or enhance everyday Windows workflows.
  • The benchmark evaluates agent performance based on task completion rather than merely following human demonstrations.
  • Navi, developed as a new multi-modal AI agent, demonstrated its adaptability across different environments on the WindowsAgentArena and performed reasonably well on the secondary web-based benchmark Mind2Web.
  • Navi relies on SoMs and UIA parsing, enabling more precise agent interactions, paving the way for more capable and efficient AI agents in the future.
  • The dataset's performance is still comparatively lower than the current 74.5% success rate achieved by unassisted humans.
  • Researchers can leverage WindowsAgentArena's diverse set of tasks and innovative metrics to accelerate progress in multi-modal agent research.
  • Windows Agent Arena (WAA): A Scalable Open-Sourced Windows AI Agent Platform for Testing and Benchmarking Multi-modal, Desktop AI Agent.

Read Full Article

like

20 Likes

For uninterrupted reading, download the app