<ul data-eligibleForWebStory="false"><li>Recent empirical evidence shows that heavy-tailed gradient noise in machine learning challenges standard assumptions of bounded variance in stochastic optimization.</li><li>Gradient clipping is commonly used to address heavy-tailed noise, but current theoretical understanding has limitations, such as relying on large clipping thresholds and sub-optimal sampling complexity.</li><li>A new approach, Normalized SGD (NSGD), is introduced to overcome these issues by establishing parameter-free sample complexity and improving convergence rates even when problem parameters are known.</li><li>The study on NSGD offers improved sample complexities, matching lower bounds for first-order methods, and ensures high-probability convergence with a mild dependence on failure probability.</li></ul>

From Gradient Clipping to Normalization for Heavy Tailed SGD

Discover more