Diffusion probabilistic models are essential in modern generative AI, but their generalization mechanisms are not well understood.
In highly overparameterized diffusion models, generalization in natural data domains is achieved during training before memorization occurs.
Results show that the time taken for memorization is proportional to the dataset size, highlighting a competition between generalization and memorization time scales.
A principled early-stopping criterion scaling with dataset size can optimize generalization and prevent memorization, with implications for hyperparameter transfer and privacy-sensitive applications.