menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

NestQuant:...
source image

Arxiv

1d

read

152

img
dot

Image Credit: Arxiv

NestQuant: Nested Lattice Quantization for Matrix Products and LLMs

  • NestQuant is a new Post-training quantization (PTQ) method for efficient deployment of large language models (LLMs), based on self-similar nested lattices.
  • NestQuant is identified to be information-theoretically optimal for low-precision matrix multiplication, using a practical low-complexity version based on Gosset lattice.
  • It is a drop-in quantizer for any matrix multiplication step in LLMs, like self-attention, MLP, etc.
  • NestQuant quantizes weights, KV-cache, and activations of Llama-3-8B model to 4 bits, achieving a perplexity of 6.6 on wikitext2.
  • This results in more than a 55% reduction in perplexity gap compared to unquantized models, outperforming state-of-the-art methods like Metas SpinQuant, OstQuant, and QuaRot.
  • Tests on larger models (up to 70B) and various LLM evaluation benchmarks consistently show NestQuant's superiority.

Read Full Article

like

9 Likes

For uninterrupted reading, download the app