How much information do LLMs really memorize? Now we know, thanks to Meta, Google, Nvidia and Cornell

A naukri.com initiative

New

How much i...

VentureBeat

Image Credit: VentureBeat

Large Language Models (LLMs) like ChatGPT are trained on massive datasets to develop a statistical understanding of language and the world.
LLMs learn to detect patterns in their parameters, influencing their responses based on training data associations.
The question of how much LLMs memorize versus generalize has been answered in a study by Meta, Google DeepMind, Cornell, and NVIDIA.
The study found that GPT-style models have a fixed memorization capacity of approximately 3.6 bits per parameter.
Models do not memorize more with increased data; rather, their fixed capacity is distributed across the dataset.
Training on more data forces models to memorize less per sample, leading to safer generalization behavior.
Researchers used transformer models trained on random bitstrings to quantify how much language models memorize.
As dataset size increases, models shift towards learning generalizable patterns, reducing memorization.
Increasing model precision showed a modest increase in memorization capacity, with diminishing returns observed.
The study provides new tools for evaluating language models' behavior, aiding in transparency and ethical standards in AI development.

Read Full Article

For uninterrupted reading, download the app