<ul data-eligibleForWebStory="true"><li>Researchers have introduced DiLoCoX, a low-communication large-scale decentralized cluster training framework for distributed training of large language models.</li><li>DiLoCoX combines Pipeline Parallelism, Dual Optimizer Policy, One-Step-Delay Overlap of Communication, and Adaptive Gradient Compression Scheme to enhance scalability and speed of model pre-training.</li><li>The framework enables pre-training a 107B foundation model over a 1Gbps network, achieving a 357x speedup in distributed training compared to vanilla AllReduce with minimal impact on model convergence.</li><li>This marks the first successful application of a decentralized training framework to models exceeding 100 billion parameters.</li></ul>

DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster

Discover more