CAOTE (KV Caching through Attention Output Error based Token Eviction) is a method proposed to optimize token eviction in large language models.Token eviction is a post-training methodology used to alleviate memory and compute challenges in resource-restricted devices.CAOTE integrates attention scores and value vectors to improve the accuracy on downstream tasks.It is the first method to use value vector information in combination with attention-based eviction scores.