Researchers introduce GPTQv2, a finetuning-free quantization method for compressing large-scale transformer architectures.GPTQv2 uses asymmetric calibration to match the quantized layer's output to the output in the full-precision model, reducing quantization error.Various techniques such as channel parallelization and Cholesky reformulation are utilized to parallelize the solution calculation in GPTQv2.GPTQv2 demonstrates improved performance in low-bit quantization on a 405B language transformer and EVA-02 vision transformer.