menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

BAQ: Effic...
source image

Arxiv

3d

read

115

img
dot

Image Credit: Arxiv

BAQ: Efficient Bit Allocation Quantization for Large Language Models

  • Post-training model quantization is widely used to reduce memory and computational costs of large language models.
  • A novel framework for allocating quantization bitwidths based on sensitivity metrics derived from a Hessian proxy is proposed.
  • The proposed BAQ algorithm achieves a good trade-off between loss minimization and complexity for large language models.
  • Experimental results show that BAQ outperforms GPTQ, achieving up to 56 times lower perplexity at the same bitwidth.

Read Full Article

like

6 Likes

For uninterrupted reading, download the app