<ul><li>Mosaic is a novel system introduced for creating and deploying pruned large language models (LLMs) using composite projection pruning.</li><li>Projection pruning is a fine-grained method for reducing the size of LLMs by removing unnecessary model parameters.</li><li>Composite projection pruning is a synergistic combination of unstructured pruning and structured pruning to optimize accuracy and model size reduction.</li><li>Mosaic outperforms existing approaches by being 7.19 times faster in producing models, achieving up to 84.2% lower perplexity, and 31.4% higher accuracy, while also improving inference speed and GPU memory utilization.</li></ul>

Mosaic: Composite Projection Pruning for Resource-efficient LLMs

Discover more