menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

GPTQv2: Ef...
source image

Arxiv

1M

read

103

img
dot

Image Credit: Arxiv

GPTQv2: Efficient Finetuning-Free Quantization for Asymmetric Calibration

  • Researchers introduce GPTQv2, a finetuning-free quantization method for compressing large-scale transformer architectures.
  • GPTQv2 uses asymmetric calibration to match the quantized layer's output to the output in the full-precision model, reducing quantization error.
  • Various techniques such as channel parallelization and Cholesky reformulation are utilized to parallelize the solution calculation in GPTQv2.
  • GPTQv2 demonstrates improved performance in low-bit quantization on a 405B language transformer and EVA-02 vision transformer.

Read Full Article

like

6 Likes

For uninterrupted reading, download the app