menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

HashEvict:...
source image

Arxiv

14h

read

302

img
dot

Image Credit: Arxiv

HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing

  • Transformer-based large language models (LLMs) use the key-value (KV) cache to accelerate inference by storing past token embeddings.
  • HashEvict is an algorithm that uses locality-sensitive hashing (LSH) to compress the KV cache.
  • HashEvict quickly locates tokens in the cache that are cosine dissimilar to the current query token.
  • HashEvict can compress the KV cache by 30%-70% while maintaining high performance across various tasks.

Read Full Article

like

18 Likes

For uninterrupted reading, download the app