menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Mechanisti...
source image

Arxiv

1d

read

50

img
dot

Image Credit: Arxiv

Mechanistic Insights into Grokking from the Embedding Layer

  • Grokking, a phenomenon observed in neural networks post perfect training, has been linked to embeddings in Transformers and MLPs.
  • Introducing embeddings in MLPs induces delayed generalization in modular arithmetic tasks, highlighting their central role in grokking.
  • The analysis identifies two key mechanisms driving grokking: embedding update dynamics and bilinear coupling between embeddings and downstream weights.
  • Methods like frequency-aware sampling and embedding-specific learning rates are proposed to mitigate bilinear coupling effects and improve grokking dynamics in neural networks.

Read Full Article

like

3 Likes

For uninterrupted reading, download the app