<ul><li>CAOTE (KV Caching through Attention Output Error based Token Eviction) is a method proposed to optimize token eviction in large language models.</li><li>Token eviction is a post-training methodology used to alleviate memory and compute challenges in resource-restricted devices.</li><li>CAOTE integrates attention scores and value vectors to improve the accuracy on downstream tasks.</li><li>It is the first method to use value vector information in combination with attention-based eviction scores.</li></ul>

CAOTE: KV Caching through Attention Output Error based Token Eviction

Discover more