<ul data-eligibleForWebStory="false"><li>Offline reinforcement learning aims to learn policies without online exploration by utilizing a dynamics model to generate simulation data for policy learning.</li><li>A new approach called offline trajectory optimization (OTTO) is proposed, which focuses on conducting long-horizon simulations and using model uncertainty to evaluate and correct the generated data.</li><li>OTTO utilizes an ensemble of Transformers known as World Transformers to predict environment dynamics and reward functions, generating long-horizon trajectory simulations and correcting low-confidence data through an uncertainty-based World Evaluator.</li><li>Experiments indicate that OTTO can enhance the performance of offline RL algorithms, even in complex environments with sparse rewards, such as AntMaze.</li></ul>

Offline Trajectory Optimization for Offline Reinforcement Learning

Discover more