<ul><li>HALoS is a hierarchical asynchronous optimization framework designed for training large language models (LLMs) in geo-distributed environments.</li><li>It introduces local parameter servers (LPSs) within each region and a global parameter server (GPS) to minimize inter-region communication costs and improve training efficiency.</li><li>HALoS achieves faster convergence compared to synchronous baselines and existing asynchronous methods in geo-distributed LLM training, with up to 7.5x faster convergence and improvements of up to 2.1x.</li><li>The framework maintains model quality while reducing total training time, making it a powerful tool for scalable and efficient training of large language models in heterogeneous, geo-distributed settings.</li></ul>

HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training

Discover more