<ul data-eligibleForWebStory="true"><li>The general-utility Markov decision processes (GUMDPs) framework extends the traditional MDPs by incorporating objective functions dependent on state-action pair visitation frequency induced by a policy.</li><li>This study analyzes the impact of the number of trials, representing randomly sampled trajectories, in infinite-horizon GUMDPs, revealing its significance in contrast to standard MDPs.</li><li>The research shows that the number of trials is crucial in infinite-horizon GUMDPs, where the expected policy performance is influenced by the number of trials.</li><li>Policy evaluation under discounted and average GUMDPs is investigated, presenting bounds on the discrepancy between finite and infinite trials formulations and empirical results supporting the findings.</li></ul>

The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes

Discover more