<ul><li>Researchers have developed subquadratic algorithms for computing Attention in Transformers with head dimension d = Theta(log n).</li><li>Subquadratic Attention is feasible when inputs have small entries bounded by B = o(sqrt(log n)), or when softmax is applied with high temperature for d = Theta(log n).</li><li>Efficient computation of Attention without strong assumptions on temperature is explored, with subquadratic algorithms presented for constant d = O(1).</li><li>The study concludes that in certain scenarios, the standard algorithm for Attention is optimal under fine-grained complexity assumptions.</li></ul>

Subquadratic Algorithms and Hardness for Attention with Any Temperature

Discover more