Researchers from Imperial College London introduced TD3-BST (TD3 with Behavioral Supervisor Tuning), an algorithm that uses an uncertainty model to adjust the strength of regularization dynamically.
TD3-BST helps adjust regularization dynamically using an uncertainty network, optimizing Q-values around dataset modes.
TD3-BST outperforms other methods and showcases state-of-the-art performance when tested on D4RL datasets.
The integration of policy regularization with an ensemble-based source of uncertainty enhances the performance of TD3-BST.