The success of deep reinforcement learning (DRL) relies on the availability and quality of training data, often requiring extensive interactions with specific environments.
Offline reinforcement learning (RL) provides a solution in real-world scenarios where data collection is costly and risky by utilizing data collected by domain experts to search for a batch-constrained optimal policy.
Transition Scoring (TS) is introduced as a method to assign scores to transitions based on their similarity to the target domain in mixed datasets, addressing the problem of source-target domain mismatch in offline RL.
Curriculum Learning-Based Trajectory Valuation (CLTV) effectively leverages transition scores to identify and prioritize high-quality trajectories, enhancing the performance and transferability of policies learned by offline RL algorithms.