menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

SQuat: Sub...
source image

Arxiv

1d

read

250

img
dot

Image Credit: Arxiv

SQuat: Subspace-orthogonal KV Cache Quantization

  • Researchers propose SQuat (Subspace-orthogonal KV cache quantization) to reduce memory usage in key-value (KV) cache used for LLMs decoding.
  • SQuat constructs a subspace spanned by query tensors to capture critical task-related information.
  • SQuat enforces orthogonality between (de)quantized and original keys in the subspace, minimizing the impact of quantization errors.
  • The method achieves reduced memory usage, improved throughput, and better benchmark scores compared to existing KV cache quantization algorithms.

Read Full Article

like

15 Likes

For uninterrupted reading, download the app