<ul><li>A new method called Provably Lifetime Safe RL (PLS) has been proposed for safe reinforcement learning (RL).</li><li>PLS integrates offline safe RL with safe policy deployment to ensure the safety of a policy from learning to operation.</li><li>The method learns a policy offline using return-conditioned supervised learning and optimizes target returns using Gaussian processes (GPs) during deployment.</li><li>Empirical results show that PLS outperforms baselines in safety and reward performance, achieving the goal of high rewards while maintaining policy safety.</li></ul>

A Provable Approach for End-to-End Safe Reinforcement Learning

Discover more