Batch Gradient Descent uses the entire dataset to calculate the gradient and update parameters once per epoch. It is accurate but computationally expensive for large datasets.
Stochastic Gradient Descent updates parameters for each data point, using only one example at a time. It is faster but may result in fluctuations or jagged progress.
Mini-batch Gradient Descent divides data into small batches and calculates gradients for each batch, balancing accuracy and efficiency. It is the most popular in deep learning.
The choice of gradient descent algorithm depends on data size, computational resources, and model requirements. Mini-batch Gradient Descent is often preferred for its efficiency and balanced performance.