<ul data-eligibleForWebStory="true"><li>Large Language Models (LLMs) are known for their high performance but face challenges in practical deployment due to their large size.</li><li>Efforts have been made to apply traditional network pruning techniques to LLMs to reduce their size without impacting performance.</li><li>A new pruning methodology called Outlier Weighed Layerwise sparsity (OWL) has been introduced, which considers non-uniform layerwise sparsity ratios based on outlier ratios within each layer.</li><li>Empirical evaluations show that OWL outperforms previous methods, achieving significant performance gains and faster inference speeds at high sparsity levels.</li></ul>

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

Discover more