menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Gradual Bi...
source image

Arxiv

4d

read

380

img
dot

Image Credit: Arxiv

Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs

  • Large language models (LLMs) have become pivotal in artificial intelligence, but their deployment on edge devices is hindered by their substantial size.
  • Quantization is a widely used method to reduce memory usage and inference time, but LLMs present unique challenges due to the prevalence of outliers in their activations.
  • In this work, the authors propose a method based on gradual binary search and the use of Hadamard matrices to address the challenges of activation quantization in LLMs.
  • The proposed method enables 3-bit quantization for weights, activations, and key-value (KV) caches, resulting in improved model performance compared to state-of-the-art methods.

Read Full Article

like

22 Likes

For uninterrupted reading, download the app