menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

NQKV: A KV...
source image

Arxiv

1d

read

3

img
dot

Image Credit: Arxiv

NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics

  • The NQKV algorithm aims to optimize the Key-Value (KV) cache memory resource consumption in Large Language Models (LLMs) during inference.
  • It quantizes the KV cache to even lower bits based on the normal distribution characteristics of the elements within each block of the cache.
  • NQKV allows the OPT model to operate with a larger batch size or longer context length, improving throughput by 9.3x without significant impact on model output quality.
  • Quantization to lower bits using NQKV addresses the bottleneck of memory resource consumption in LLMs during inference, enhancing efficiency in deployment.

Read Full Article

like

Like

For uninterrupted reading, download the app