Model-based offline reinforcement learning (MORL) focuses on learning a policy using a dynamics model from an existing dataset.
Existing MORL approaches generate trajectories to mimic real data distribution but can produce unreliable trajectories due to overlooking historical information.
A new MORL algorithm called Reliability-guaranteed Transformer (RT) is introduced to eliminate unreliable trajectories by assessing cumulative reliability.
The RT algorithm also efficiently generates high-return trajectories by sampling actions with high rewards and has been shown effective in benchmark tasks.