<ul data-eligibleForWebStory="true"><li>TROFI is a new approach in offline reinforcement learning that aims to train agents without a predefined reward function.</li><li>It first learns a reward function from human preferences to label the dataset, enabling training of the policy.</li><li>Experiments on the D4RL benchmark show that TROFI outperforms baselines and performs similarly to using the ground truth reward.</li><li>The efficacy of TROFI is validated in a 3D game environment, emphasizing the importance of a well-engineered reward function in reinforcement learning.</li></ul>

TROFI: Trajectory-Ranked Offline Inverse Reinforcement Learning

Discover more