A new algorithm called Polar Express has been introduced for computing the polar decomposition and matrix sign function, which are crucial in deep learning applications like the Muon optimization framework.
The Polar Express algorithm is designed to be GPU-friendly, highly efficient, and compatible with GPUs, addressing the specific requirements of deep learning where high accuracy may not be necessary.
Unlike traditional numerical analysis methods like Newton-Schulz or rational functions, Polar Express uses matrix-matrix multiplications, ensuring GPU compatibility and efficient computation.
The algorithm adapts the polynomial update rule by solving a minimax optimization problem, leading to rapid early convergence and fast asymptotic convergence, while also addressing finite-precision issues for stability in practice.