<ul><li>Offline safe reinforcement learning (RL) is a promising approach for learning safe behaviors without risky online interactions with the environment.</li><li>Existing methods in offline safe RL often result in overly conservative policies or safety constraint violations.</li><li>This paper proposes a new approach to offline safe RL that learns a policy generating desirable trajectories and avoiding undesirable ones.</li><li>The approach involves partitioning a pre-collected dataset into desirable and undesirable subsets, and using a classifier to score the trajectories.</li></ul>

Offline Safe Reinforcement Learning Using Trajectory Classification

Discover more