Unneeded elements in the attention's context degrade performance.Selective Attention is introduced as a simple parameter-free change to the standard attention mechanism.Selective Attention consistently improves language modeling and downstream task performance.Selective Attention allows for meaningful reductions in memory and compute requirements during inference.