<ul><li>A new pruning method called T'yr-the-Pruner has been proposed to enhance hardware-agnostic inference efficiency for large language models (LLMs).</li><li>T'yr-the-Pruner is an end-to-end search-based global structural pruning framework that aims to determine the optimal sparsity distribution under a target overall sparsity ratio.</li><li>The framework constructs a supernet using local pruning and expectation error accumulation approaches, and employs an iterative prune-and-search strategy for efficient convergence.</li><li>Experimental results demonstrate that T'yr-the-Pruner achieves state-of-the-art structural pruning by retaining 97% of the dense model's performance while removing 50% of Llama-3.1-70B's parameters.</li></ul>

T\'yr-the-Pruner: Unlocking Accurate 50% Structural Pruning for LLMs via Global Sparsity Distribution Optimization

Discover more