Researchers propose SQuat (Subspace-orthogonal KV cache quantization) to reduce memory usage in key-value (KV) cache used for LLMs decoding.SQuat constructs a subspace spanned by query tensors to capture critical task-related information.SQuat enforces orthogonality between (de)quantized and original keys in the subspace, minimizing the impact of quantization errors.The method achieves reduced memory usage, improved throughput, and better benchmark scores compared to existing KV cache quantization algorithms.