menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Infinite-W...
source image

Arxiv

3d

read

88

img
dot

Image Credit: Arxiv

Infinite-Width Limit of a Single Attention Layer: Analysis via Tensor Programs

  • The paper discusses the infinite-width limit of a single attention layer in neural networks by leveraging the Tensor Programs framework.
  • Current Gaussian-based theories fail to accurately model attention layers, but this study identifies the distribution of variables in an attention layer without relying on infinite-head approximations or tailored scalings.
  • The resulting limit law deviates from Gaussianity and showcases non-Gaussian behavior due to a hierarchical structure, being Gaussian conditional on random similarity scores.
  • Numerical experiments validate the theoretical predictions, highlighting the theory's effectiveness in describing finite-width and finite-head attentions, with implications for deep Transformer architectures.

Read Full Article

like

5 Likes

For uninterrupted reading, download the app