<ul><li>To apply reinforcement learning to safety-critical applications, safety guarantees during policy training and deployment are necessary.</li><li>The paper presents the concept of Safe Policy Ratio (SPoRt) to provide a bound on the probability of violating a safety property in a model-free, episodic setup.</li><li>SPoRt includes Projected PPO, a new approach for training task-specific policies while maintaining a user-specified bound on property violation.</li><li>The experimental results demonstrate the trade-off between safety guarantees and task-specific performance in SPoRt.</li></ul>

SPoRt -- Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL

Discover more