menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

DiLoCoX: A...
source image

Arxiv

5d

read

392

img
dot

Image Credit: Arxiv

DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster

  • Researchers have introduced DiLoCoX, a low-communication large-scale decentralized cluster training framework for distributed training of large language models.
  • DiLoCoX combines Pipeline Parallelism, Dual Optimizer Policy, One-Step-Delay Overlap of Communication, and Adaptive Gradient Compression Scheme to enhance scalability and speed of model pre-training.
  • The framework enables pre-training a 107B foundation model over a 1Gbps network, achieving a 357x speedup in distributed training compared to vanilla AllReduce with minimal impact on model convergence.
  • This marks the first successful application of a decentralized training framework to models exceeding 100 billion parameters.

Read Full Article

like

23 Likes

For uninterrupted reading, download the app