Enabling large language models (LLMs) to unlearn knowledge and capabilities acquired during training has become crucial for compliance with data regulations and promoting ethical practices in generative AI.
Existing unlearning algorithms face challenges in formulating the unlearning problem effectively, with the most common approach using a combination of forget and retain loss, leading to performance degradation.
A new approach, called Bi-Level UnleaRning (BLUR), is proposed in this work, focusing on a hierarchical structure of unlearning where forgetting certain knowledge and capabilities takes precedence over retaining model utility.
BLUR, based on a bi-level optimization formulation, outperforms existing algorithms in various unlearning tasks, models, and metrics, offering strong theoretical guarantees along with superior performance.