<ul><li>Large language models (LLMs) have grown significantly with some recent models containing trillions of parameters, leading to substantial computational challenges in terms of memory and compute resources.</li><li>Efforts have been made to address these challenges including the exploration of approaches like LoRA, which have proven effective for fine-tuning but more challenging for pre-training due to the need to learn vast datasets.</li><li>The study aims to determine if parameter- or memory-efficient methods can enhance pre-training efficiency while maintaining performance comparable to full-model training, and proposes practical techniques like weight refactorization and momentum reset to achieve this.</li><li>Benchmark evaluations of memory efficient pre-training approaches show that full-rank training with the right optimizer and hyperparameters delivers the best performance, and incorporating high-rank updates in low-rank approaches is crucial for improving performance.</li></ul>

Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking

Discover more