Researchers propose the UI-R1 framework to extend rule-based reinforcement learning (RL) to GUI action prediction tasks for large language models (LLMs) and graphic user interface (GUI) agents.
The UI-R1 framework utilizes the DeepSeek R1 style RL and a curated dataset with 136 challenging tasks across five common mobile device action types to optimize model reasoning capabilities.
The UI-R1 framework shows significant improvements in action type accuracy and grounding accuracy compared to the base model, both in-domain and out-of-domain scenarios.
UI-R1 outperforms most 7B models on GUI grounding benchmarks, achieving performance comparable to state-of-the-art models trained with supervised fine-tuning on larger datasets.