<ul><li>Linear Attention (LA) is an important framework that popularized kernel attention and its relation to recurrent autoregressive models.</li><li>LA has various variants such as Random Feature Attention (RFA), Performer, TransNormer, cosFormer, and Linear Randomized Attention.</li><li>Efficient attention models beyond kernel attention also exist.</li><li>Long context models have become popular, but this work presents one of the first approaches that demonstrate increasing performance with longer context.</li></ul>

Linear Attention and Long Context Models

Discover more