<ul><li>A study explores the connection between gradient-based optimization of parametric models like neural networks and optimization of linear combinations of random features.</li><li>The main finding suggests that if a parametric model can be learned using mini-batch stochastic gradient descent without requiring assumptions about data distribution, then the target function can be approximated using a polynomial-sized combination of random features with high probability.</li><li>The size of the combination of random features depends on the number of gradient steps and numerical precision utilized in the bSGD process.</li><li>The study highlights the limitations of distribution-free learning in neural networks trained by gradient descent and emphasizes the importance of making assumptions about data distributions in practical scenarios.</li></ul>

The Power of Random Features and the Limits of Distribution-Free Gradient Descent

Discover more