<ul><li>The success of Shampoo in the AlgoPerf contest has led to a resurgence of interest in Kronecker-factorization-based optimization algorithms for training neural networks.</li><li>Shampoo depends on heuristics like learning rate grafting and stale preconditioning for performance at-scale, which increase complexity and require hyperparameter tuning without solid theoretical backing.</li><li>This study explores these heuristics by focusing on Frobenius norm approximation to full-matrix Adam and separating the preconditioner's eigenvalues and eigenbasis updates.</li><li>The research demonstrates how grafting from Adam can address staleness and mis-scaling of eigenvalues, eliminating the need for learning rate grafting, along with proposing adaptive criteria for eigenbasis computation frequency to better manage errors and improve convergence.</li></ul>

Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner

Discover more