Ensuring safety in reinforcement learning (RL)-based robotic systems is a critical challenge, especially in contact-rich tasks within unstructured environments.
Current safe RL approaches focus on high-level recovery mechanisms, neglecting low-level execution safety.
The proposed method, Bresa, decouples task learning from safety learning, incorporating a safety critic network that operates at a higher frequency for real-time intervention in unsafe conditions.
Bresa outperforms the baseline in multiple tasks and provides a reflexive safety mechanism that bridges the gap between planning and execution.