Machine unlearning aims to remove specific data influence from a trained model, with existing metrics like unlearning accuracy and membership inference attack falling short in assessing forgetting reliability.
A new approach inspired by conformal prediction introduces two novel metrics to evaluate forgetting quality more reliably, addressing the issue of fake unlearning where misclassified data still retain their ground truth labels in the prediction set.
A conformal prediction-based unlearning framework is proposed, integrating conformal prediction into Carlini & Wagner adversarial attack loss to significantly exclude ground truth labels from the conformal prediction set.
Extensive experiments on image classification tasks demonstrate the effectiveness of the new metrics and the superiority of the unlearning framework, improving UA of existing methods by an average of 6.6% solely through the tailored loss term inclusion.