Low-rank optimization is being used in training large language models to reduce memory usage of adaptive optimizers by restricting learning to a lower-dimensional space.
A new two-step procedure is proposed to approximate SVD-based gradient projections efficiently in large models.
The procedure involves constructing an orthogonal basis using Discrete Cosine Transform matrices and adaptively selecting basis columns aligned with each layer's gradient.
The method achieves optimal low-rank projections, matching SVD-based methods' performance while being computationally efficient, faster, and reducing memory usage.