Decentralized learning is seen as a scalable alternative to traditional parameter-server-based training, but faces challenges due to limited peer-to-peer communication.
Researchers studied how communication is scheduled in decentralized learning and found that concentrating communication in later stages improves global generalization.
The study revealed that fully connected communication with a single global merging at the final step can match the performance of server-based training.
Theoretical contributions of the research show that globally merged decentralized SGD can converge faster than centralized mini-batch SGD, challenging common beliefs about decentralized learning.