The relationship between inverse reinforcement learning (IRL) and inverse optimization (IO) for Markov decision processes (MDPs) is explored in this work.
The study incorporates prior beliefs on the cost function's structure into IRL and apprenticeship learning (AL) problems.
The convex-analytic view of the AL formalism is identified as a relaxation of the framework, with AL being a special case when the regularization term is absent.
The AL problem in the suboptimal expert setting is formulated as a regularized min-max problem, utilizing stochastic mirror descent (SMD) to solve it and establish convergence bounds.