GuidedQuant is a novel quantization approach that integrates gradient information from the end loss into the quantization objective while preserving cross-weight dependencies within output channels.
This approach enhances the performance of quantization methods across weight-only scalar, weight-only vector, and weight-and-activation quantization.
A novel non-uniform scalar quantization algorithm introduced alongside GuidedQuant is proven to monotonically decrease the quantization objective value, outperforming existing methods in this category.
The code for GuidedQuant is available at https://github.com/snu-mllab/GuidedQuant.