<ul><li>The relationship between inverse reinforcement learning (IRL) and inverse optimization (IO) for Markov decision processes (MDPs) is explored in this work.</li><li>The study incorporates prior beliefs on the cost function's structure into IRL and apprenticeship learning (AL) problems.</li><li>The convex-analytic view of the AL formalism is identified as a relaxation of the framework, with AL being a special case when the regularization term is absent.</li><li>The AL problem in the suboptimal expert setting is formulated as a regularized min-max problem, utilizing stochastic mirror descent (SMD) to solve it and establish convergence bounds.</li></ul>

Apprenticeship learning with prior beliefs using inverse optimization

Discover more