menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Training D...
source image

Arxiv

3d

read

39

img
dot

Image Credit: Arxiv

Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning

  • This work explores the impact of scaling on language models and training dynamics.
  • Language models experience loss deceleration early in training, resulting in a piecewise linear behavior of the loss curve in log-log space.
  • Scaling up the model helps mitigate this transition by improving the rate of loss improvement after deceleration and lowering the loss at which deceleration occurs.
  • Loss deceleration is attributed to a training dynamic known as zero-sum learning (ZSL), where per-example gradients oppose each other, hindering overall progress.

Read Full Article

like

2 Likes

For uninterrupted reading, download the app