<ul><li>A formal framework is introduced to analyze length generalization in transformers with learnable absolute positional encodings.</li><li>The framework characterizes identifiable functions from long inputs and proves the possibility of length generalization for a wide range of problems.</li><li>Experimental validation shows the theory as a predictor of success and failure of length generalization in various tasks.</li><li>The theory offers explanations for empirical observations and allows for provably predicting length generalization capabilities in transformers.</li></ul>

A Formal Framework for Understanding Length Generalization in Transformers

Discover more