Post-training quantization (PTQ) is a method to reduce a model's memory footprint without retraining.
A new mixed-precision PTQ approach called Task-Circuit Quantization (TaCQ) conditions the quantization process on specific weight circuits associated with downstream task performance.
TaCQ preserves task-specific weights by contrasting unquantized model weights with uniformly-quantized model weights.
Experimental results show that TaCQ outperforms existing mixed-precision quantization methods, achieving major improvements in the low 2- to 3-bit regime.