Recent theoretical results show that permutation-based Stochastic Gradient Descent (SGD) can converge faster than uniform-sampling SGD.
Studies primarily focus on the large epoch regime where the epoch count exceeds the condition number, but less is known for smaller epoch counts relative to the condition number.
Analysis of Incremental Gradient Descent (IGD) on smooth and strongly convex functions indicates surprisingly slow convergence in the small epoch regime.
When some component functions are nonconvex, the optimality gap of IGD can be notably worse in the small epoch regime, showcasing the variation in convergence properties based on assumptions.