menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

STADE: Sta...
source image

Arxiv

3d

read

148

img
dot

Image Credit: Arxiv

STADE: Standard Deviation as a Pruning Metric

  • Large Language Models (LLMs) are widely used for various tasks, but they require long training times and large model sizes.
  • Pruning methods like Wanda can reduce computational demands without retraining and are effective in maintaining performance.
  • This study provides a theoretical explanation of the effectiveness of Wanda and introduces a new pruning method called STADE based on the standard deviation of the input.
  • Experiments on Llama and Open Pre-trained Transformers (OPT) models validate the theoretical findings, demonstrating the variability of Wanda's optimal performance depending on training conditions.

Read Full Article

like

8 Likes

For uninterrupted reading, download the app