Large Language Models (LLMs) are widely used for various tasks, but they require long training times and large model sizes.
Pruning methods like Wanda can reduce computational demands without retraining and are effective in maintaining performance.
This study provides a theoretical explanation of the effectiveness of Wanda and introduces a new pruning method called STADE based on the standard deviation of the input.
Experiments on Llama and Open Pre-trained Transformers (OPT) models validate the theoretical findings, demonstrating the variability of Wanda's optimal performance depending on training conditions.