Vanishing gradients occur when gradients become very small as they propagate through a deep network.
This leads to early layers receiving little to no signal and impairs learning.
Vanishing gradients are common in networks with sigmoid or tanh activations, many layers, and poorly chosen initial weights.
To mitigate vanishing gradients, techniques like ReLU activation, batch normalization, proper weight initialization, and skip connections are recommended.