Neural networks consist of interconnected nodes organized in layers, including hidden layers, output layer, weights, biases, and activation functions.
Activation functions like Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, and Softmax introduce non-linearity in neural networks.
Backpropagation is crucial for neural networks to learn efficiently by calculating gradients and updating parameters based on errors.
The vanishing/exploding gradient problem in deep networks can be addressed through techniques like weight initialization, batch normalization, and LSTM/GRU.
CNNs, RNNs, and Transformers are specialized architectures for different data types such as images, sequential data, and text data, each suited to specific tasks.
Neural network optimizers like SGD, Adam, RMSprop, and AdaGrad adjust parameters to minimize loss, each with trade-offs in convergence and generalization.
Proper weight initialization is essential to prevent vanishing/exploding gradients and optimize network training using strategies like Xavier, He, and LSUV.
Batch normalization normalizes layer inputs, reducing internal covariate shift, improving training speed, and aiding convergence in deep networks.
Combatting overfitting in neural networks involves data augmentation, dropout, early stopping, and regularization techniques to improve generalization.
Embeddings in neural networks are low-dimensional vector representations of categorical variables that capture semantic relationships and facilitate transferable knowledge.