<ul><li>Researchers propose the UI-R1 framework to extend rule-based reinforcement learning (RL) to GUI action prediction tasks for large language models (LLMs) and graphic user interface (GUI) agents.</li><li>The UI-R1 framework utilizes the DeepSeek R1 style RL and a curated dataset with 136 challenging tasks across five common mobile device action types to optimize model reasoning capabilities.</li><li>The UI-R1 framework shows significant improvements in action type accuracy and grounding accuracy compared to the base model, both in-domain and out-of-domain scenarios.</li><li>UI-R1 outperforms most 7B models on GUI grounding benchmarks, achieving performance comparable to state-of-the-art models trained with supervised fine-tuning on larger datasets.</li></ul>

This AI Paper Propose the UI-R1 Framework that Extends Rule-based Reinforcement Learning to GUI Action Prediction Tasks

Discover more