menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Too Big to...
source image

Arxiv

2d

read

128

img
dot

Image Credit: Arxiv

Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

  • The relationship between memorization and generalization in large language models (LLMs) is being investigated in this study.
  • Pre-training capacity-limited Transformer models from scratch on synthetic character-level tasks showed a trade-off between memorization and generalization.
  • Small models excel in extrapolating unseen arithmetic cases but fail at memorization, whereas larger models are better at memorization but struggle with extrapolation.
  • An intermediate-capacity model also shows a shift toward memorization rather than generalization.
  • When trained on both tasks together, no size of model succeeds at extrapolation.
  • The study indicates that pre-training may inherently prioritize one learning mode over the other.
  • By examining these dynamics in a controlled setting, the research provides insights into how model capacity influences learning behavior and its implications for small language model design and deployment.

Read Full Article

like

7 Likes

For uninterrupted reading, download the app