<ul data-eligibleForWebStory="true"><li>A new study presents the Junk DNA Hypothesis, focusing on the pre-trained weights of large language models like GPT-3.</li><li>The hypothesis challenges the belief that pruning small weights in LLMs does not affect performance, suggesting that these weights actually encode vital information for challenging tasks.</li><li>Removing these seemingly insignificant weights can lead to an irreversible loss of knowledge and performance decline in difficult tasks, even with continued training.</li><li>Quantization as a compression method does not exhibit the same effect as weight pruning in exposing task difficulty information according to the study. Extensive experiments support the Junk DNA Hypothesis.</li></ul>

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs

Discover more