Bi-level optimization has become crucial for hierarchical machine learning problems, but traditional gradient-based algorithms are not suitable for large-scale applications.
A new approach called FG2U (Forward Gradient Unrolling with Forward Fradient) is introduced, which provides more accurate gradient estimates and supports parallel computing.
FG2U can be used in different stages of the training process and is easily implemented in deep learning frameworks.
Extensive evaluations demonstrate the superior performance of FG2U in diverse large-scale bi-level optimization tasks.