menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Offline Tr...
source image

Arxiv

2d

read

207

img
dot

Image Credit: Arxiv

Offline Trajectory Optimization for Offline Reinforcement Learning

  • Offline reinforcement learning aims to learn policies without online exploration by utilizing a dynamics model to generate simulation data for policy learning.
  • A new approach called offline trajectory optimization (OTTO) is proposed, which focuses on conducting long-horizon simulations and using model uncertainty to evaluate and correct the generated data.
  • OTTO utilizes an ensemble of Transformers known as World Transformers to predict environment dynamics and reward functions, generating long-horizon trajectory simulations and correcting low-confidence data through an uncertainty-based World Evaluator.
  • Experiments indicate that OTTO can enhance the performance of offline RL algorithms, even in complex environments with sparse rewards, such as AntMaze.

Read Full Article

like

12 Likes

For uninterrupted reading, download the app