The study focuses on neural collapse phenomenon in deep neural networks and its implications in modern architectures like ResNets and Transformers.
Existing research has primarily been on data-agnostic models, but this paper analyzes data-aware models, proving that global optima of deep regularized transformers and ResNets exhibit neural collapse.
The research demonstrates that neural collapse becomes more pronounced as the depth of the networks increases in computer vision and language datasets.
Theoretical results suggest that deep ResNets and transformers' training can be reduced to an equivalent unconstrained features model, reinforcing their widespread applicability in various settings.