A new framework called GALA (Gradient Alignment-based Learning rate Adaptation) has been proposed for dynamically adjusting the learning rate in large-scale deep learning models.
GALA tracks the alignment between consecutive gradients and uses a local curvature estimate to adapt the learning rate effectively.
The method formulates the learning rate selection problem as a one-dimensional online learning problem and pairs it with an algorithm like Follow-the-Regularized-Leader.
Empirical results show that optimizers like SGD and Adam, combined with GALA, perform well across various initial learning rates without requiring extensive tuning.