<ul data-eligibleForWebStory="true"><li>The relationship between memorization and generalization in large language models (LLMs) is being investigated in this study.</li><li>Pre-training capacity-limited Transformer models from scratch on synthetic character-level tasks showed a trade-off between memorization and generalization.</li><li>Small models excel in extrapolating unseen arithmetic cases but fail at memorization, whereas larger models are better at memorization but struggle with extrapolation.</li><li>An intermediate-capacity model also shows a shift toward memorization rather than generalization.</li><li>When trained on both tasks together, no size of model succeeds at extrapolation.</li><li>The study indicates that pre-training may inherently prioritize one learning mode over the other.</li><li>By examining these dynamics in a controlled setting, the research provides insights into how model capacity influences learning behavior and its implications for small language model design and deployment.</li></ul>

Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

Discover more