Integrating pre-collected offline data from a different environment can enhance reinforcement learning efficiency, but challenges arise due to discrepancies in transition dynamics.
Existing methods address this issue by penalizing or filtering out source transitions in high dynamics-gap regions, but their estimation methods can be problematic.
To address these limitations, a new method called CompFlow is proposed, which leverages flow matching and optimal transport principles to model target dynamics.
CompFlow offers improved generalization for learning target dynamics and a principled estimation of the dynamics gap, resulting in enhanced performance compared to strong baselines in RL benchmarks with shifted dynamics.