The general-utility Markov decision processes (GUMDPs) framework extends the traditional MDPs by incorporating objective functions dependent on state-action pair visitation frequency induced by a policy.
This study analyzes the impact of the number of trials, representing randomly sampled trajectories, in infinite-horizon GUMDPs, revealing its significance in contrast to standard MDPs.
The research shows that the number of trials is crucial in infinite-horizon GUMDPs, where the expected policy performance is influenced by the number of trials.
Policy evaluation under discounted and average GUMDPs is investigated, presenting bounds on the discrepancy between finite and infinite trials formulations and empirical results supporting the findings.