SUMO (Subspace-Aware Moment-Orthogonalization) optimizer introduced for training large language models (LLMs) efficiently.
SUMO utilizes exact singular value decomposition (SVD) for moment orthogonalization in a low-dimensional subspace, aligning optimization steps with loss landscape spectral characteristics.
The optimizer improves convergence rates, stability, performance, and reduces memory requirements by up to 20% compared to existing methods.
Empirical evaluations confirm the effectiveness of SUMO in accelerating LLM training.