Machine unlearning is a promising approach to improve LLM safety.
Sparse Autoencoders (SAEs) can significantly improve unlearning when employed dynamically.
The proposed method, Dynamic DAE Guardrails (DSG), outperforms leading unlearning methods.
DSG addresses key drawbacks of gradient-based approaches offering enhanced computational efficiency and stability, robust performance in sequential unlearning, and better data efficiency.