Distributionally robust policy learning aims to find a policy that performs well under the worst-case distributional shift.
Existing methods for robust policy learning consider the worst-case joint distribution of the covariate and the outcome, which can be unnecessarily conservative.
This paper focuses on robust policy learning under concept drift, where only the conditional relationship between the outcome and the covariate changes.
The paper proposes a learning algorithm that maximizes the estimated policy value within a given policy class, with an optimal sub-optimality gap.