<ul><li>Researchers propose the SUV framework to prevent Large Language Models (LLMs) from memorizing copyrighted content while preserving its overall utility.</li><li>SUV constructs a dataset capturing instances of copyrighted infringement cases and unlearns the content from LLMs using Direct Preference Optimization (DPO).</li><li>To mitigate the degradation in LLMs' performance on unrelated tasks, SUV integrates gradient projection and Fisher information regularization.</li><li>Experiments on a large-scale dataset of 500 copyrighted books demonstrate the scalability and efficacy of SUV in reducing verbatim memorization without significant impact on unrelated tasks.</li></ul>

SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning

Discover more