<ul><li>Large Language Models (LLMs) are widely used for various tasks, but they require long training times and large model sizes.</li><li>Pruning methods like Wanda can reduce computational demands without retraining and are effective in maintaining performance.</li><li>This study provides a theoretical explanation of the effectiveness of Wanda and introduces a new pruning method called STADE based on the standard deviation of the input.</li><li>Experiments on Llama and Open Pre-trained Transformers (OPT) models validate the theoretical findings, demonstrating the variability of Wanda's optimal performance depending on training conditions.</li></ul>

STADE: Standard Deviation as a Pruning Metric

Discover more