Researchers propose both serial and parallel proximal (linearized) alternating direction method of multipliers (ADMM) algorithms for training residual neural networks.
The proposed algorithms mitigate the exploding gradient issue and are suitable for parallel and distributed training through regional updates.
The algorithms converge at an R-linear (sublinear) rate for both the iteration points and the objective function values.
Experimental results validate the proposed ADMM algorithms, showing rapid and stable convergence, improved performance, and high computational efficiency.