menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Scalable P...
source image

Arxiv

4d

read

340

img
dot

Image Credit: Arxiv

Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking

  • Large language models (LLMs) have grown significantly with some recent models containing trillions of parameters, leading to substantial computational challenges in terms of memory and compute resources.
  • Efforts have been made to address these challenges including the exploration of approaches like LoRA, which have proven effective for fine-tuning but more challenging for pre-training due to the need to learn vast datasets.
  • The study aims to determine if parameter- or memory-efficient methods can enhance pre-training efficiency while maintaining performance comparable to full-model training, and proposes practical techniques like weight refactorization and momentum reset to achieve this.
  • Benchmark evaluations of memory efficient pre-training approaches show that full-rank training with the right optimizer and hyperparameters delivers the best performance, and incorporating high-rank updates in low-rank approaches is crucial for improving performance.

Read Full Article

like

20 Likes

For uninterrupted reading, download the app