<ul><li>The origin of the neural scaling law in large language models (LLMs) remains unclear.</li><li>Researchers constructed a toy model to study loss scaling with model size based on the principles of superposition and feature frequency.</li><li>Weak superposition leads to loss scaling depending on feature frequency, while strong superposition results in loss being inversely proportional to model dimension.</li><li>Analysis of open-sourced LLMs shows strong superposition and confirms the predictions of the toy model, suggesting representation superposition as an important mechanism in neural scaling laws.</li></ul>

Superposition Yields Robust Neural Scaling

Discover more