<ul><li>Recent studies suggest that unlearning in large language models is often shallow, as removed knowledge can be easily recovered.</li><li>Standard unlearning evaluation practices have limitations, including introducing new information into the model during testing and varying outcomes across tasks.</li><li>Many evaluations rely on spurious correlations, impacting the trust and interpretation of their results.</li><li>To improve unlearning evaluations, two proposed principles are minimal information injection and downstream task awareness, validated through targeted experiments.</li></ul>

Existing Large Language Model Unlearning Evaluations Are Inconclusive

Discover more