<ul><li>MixLLM is a new optimization approach for quantization of LLMs.</li><li>MixLLM explores mixed-precision quantization between output features based on their salience in the global view.</li><li>By assigning larger bit-width to output features that need it most, MixLLM achieves good accuracy with low memory consumption.</li><li>MixLLM demonstrates superior accuracy and state-of-the-art system efficiency compared to existing quantization solutions.</li></ul>

MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design

Discover more