Pruning is a common technique to compress large language models by removing unimportant weights, but it often leads to performance degradation, especially under semi-structured sparsity constraints.
A new approach called DenoiseRotator is proposed to enhance pruning robustness by redistributing parameter importance to make the model more amenable to pruning.
DenoiseRotator minimizes the information entropy of normalized importance scores, concentrating importance onto a smaller subset of weights, thus improving pruning effectiveness.
Evaluation on various models shows that DenoiseRotator consistently enhances perplexity and zero-shot accuracy when compared to existing pruning techniques like Magnitude, SparseGPT, and Wanda.