<ul data-eligibleForWebStory="true"><li>Forward-mode automatic differentiation (FmAD) and zero-order (ZO) optimization have been proposed as memory-efficient alternatives to backpropagation (BP) for gradient computation.</li><li>A new study presents a comparison of BP, FmAD, and ZO methods, highlighting theoretical and empirical findings.</li><li>Theoretical analysis suggests that FmAD and ZO reduce memory usage but at the cost of accuracy, convergence speed, and computation compared to BP with checkpointing.</li><li>Empirical experiments on large models demonstrate that BP with checkpointing outperforms FmAD and ZO variants, indicating BP with checkpointing as the most effective strategy for model training in memory-constrained settings.</li></ul>

The Cost of Avoiding Backpropagation

Discover more