<ul><li>Low-rank optimization is being used in training large language models to reduce memory usage of adaptive optimizers by restricting learning to a lower-dimensional space.</li><li>A new two-step procedure is proposed to approximate SVD-based gradient projections efficiently in large models.</li><li>The procedure involves constructing an orthogonal basis using Discrete Cosine Transform matrices and adaptively selecting basis columns aligned with each layer's gradient.</li><li>The method achieves optimal low-rank projections, matching SVD-based methods' performance while being computationally efficient, faster, and reducing memory usage.</li></ul>

SVD-Free Low-Rank Adaptive Gradient Optimization for Large Language Models

Discover more