<ul><li>Efficiently handling long contexts in transformer-based language models with low perplexity is an active area of research.</li><li>A new approach called CacheFormer is proposed to tackle this problem by dividing long contexts into small segments.</li><li>The design of CacheFormer includes retrieving nearby segments in an uncompressed form when high segment-level attention occurs at the compressed level.</li><li>CacheFormer outperforms existing state-of-the-art architectures with an average perplexity improvement of 8.5% over similar model sizes.</li></ul>

CacheFormer: High Attention-Based Segment Caching

Discover more