Transformer-based large language models (LLMs) use the key-value (KV) cache to accelerate inference by storing past token embeddings.HashEvict is an algorithm that uses locality-sensitive hashing (LSH) to compress the KV cache.HashEvict quickly locates tokens in the cache that are cosine dissimilar to the current query token.HashEvict can compress the KV cache by 30%-70% while maintaining high performance across various tasks.