Large language model (LLM) unlearning is crucial in machine learning to remove the influence of specific training data without retraining the model entirely.
Techniques like Gradient Ascent, model editing, and re-steering hidden representations have been proposed for LLM unlearning.
An intention-oriented taxonomy is proposed in the paper to classify unlearning methods based on whether they aim to truly remove internal knowledge or just suppress its effects.
The paper revisits findings suggesting that many removal methods may functionally behave like suppression and explores the necessity and achievability of true removal.
Existing evaluation strategies for unlearning are surveyed, current metrics and benchmarks are critiqued, and suggestions for more reliable evaluations are provided.
Practical challenges like scalability and support for sequential unlearning in the broader deployment of unlearning methods are highlighted.
This work aims to provide a comprehensive framework for understanding and advancing unlearning in generative AI, supporting future research and guiding policy decisions on data removal and privacy.