<ul><li>Researchers propose SQuat (Subspace-orthogonal KV cache quantization) to reduce memory usage in key-value (KV) cache used for LLMs decoding.</li><li>SQuat constructs a subspace spanned by query tensors to capture critical task-related information.</li><li>SQuat enforces orthogonality between (de)quantized and original keys in the subspace, minimizing the impact of quantization errors.</li><li>The method achieves reduced memory usage, improved throughput, and better benchmark scores compared to existing KV cache quantization algorithms.</li></ul>

SQuat: Subspace-orthogonal KV Cache Quantization

Discover more