Machine unlearning is a solution for privacy and safety in large language models (LLMs) by removing knowledge selectively.
Current unlearning methods are sensitive to fine-tuning, which can recover forgotten information even from unrelated tasks.
Introducing invariance into unlearning through invariant LLM unlearning (ILU) enhances robustness and generalizes well to diverse fine-tuning tasks.
ILU outperforms existing unlearning methods like negative preference optimization (NPO) and representation misdirection for unlearning (RMU), showing superior robustness across various fine-tuning scenarios.