This paper focuses on transfer learning for dynamic decision scenarios modeled by non-stationary finite-horizon Markov decision processes.
The authors propose a novel re-weighted targeting procedure to construct transferable RL samples and introduce transfer deep Q*-learning.
The method utilizes neural network approximation with theoretical guarantees and can handle transferable and non-transferable reward functions and transition densities.
Empirical experiments on synthetic and real datasets demonstrate the effectiveness of the proposed method in non-stationary reinforcement learning contexts.