menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

QuEST: Sta...
source image

Arxiv

1d

read

343

img
dot

Image Credit: Arxiv

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

  • One approach to reducing the costs of large language models (LLMs) is through the use of quantized or sparse representations for training or deployment.
  • While post-training compression methods are popular, there is interest in obtaining more accurate compressed models by directly training over such representations with Quantization-Aware Training (QAT).
  • A recent study suggested that models can be trained using QAT at 8-bits weights and activations while maintaining accuracy.
  • A new method called QuEST advances the state-of-the-art by demonstrating optimality at 4-bits and stable convergence as low as 1-bit weights and activations.
  • QuEST achieves this through accurate and fast quantization of weights and activations using Hadamard normalization and MSE-optimal fitting, and a trust gradient estimator to minimize error between noisy and full-precision gradients.
  • Experiments show that QuEST induces stable scaling laws across various precisions and can be extended to sparse representations.
  • GPU kernel support is provided to efficiently execute models produced by QuEST.

Read Full Article

like

20 Likes

For uninterrupted reading, download the app