Using entropic inequalities from information theory, new bounds on the total variation and 2-Wasserstein distances between conditionally Gaussian and Gaussian laws are provided.
The results are applied to quantify the convergence speed of a randomly initialized fully connected neural network and its derivatives to Gaussian distributions.
The findings improve and extend previous research studies on the subject.
Cumulant estimates and activation function assumptions play a crucial role in the results.