We revisit two fundamental decentralized optimization methods, Decentralized Gradient Tracking (DGT) and Decentralized Gradient Descent (DGD), with multiple local updates.
Incorporating local update steps can reduce communication complexity for strongly convex and smooth loss functions.
Increasing the number of additional local updates can effectively reduce communication costs when data heterogeneity is low and the network is well-connected.
Employing local updates in DGD achieves exact linear convergence under the Polyak-Łojasiewicz (PL) condition.