TROFI is a new approach in offline reinforcement learning that aims to train agents without a predefined reward function.
It first learns a reward function from human preferences to label the dataset, enabling training of the policy.
Experiments on the D4RL benchmark show that TROFI outperforms baselines and performs similarly to using the ground truth reward.
The efficacy of TROFI is validated in a 3D game environment, emphasizing the importance of a well-engineered reward function in reinforcement learning.