<ul data-eligibleForWebStory="true"><li>In gradient boosting, the initial prediction y_hat is improved by using a loss function to quantify the difference between predicted and true labels.</li><li>Common loss functions for regression tasks include Mean Squared Error (MSE) and for classification tasks, cross-entropy loss is often used.</li><li>The negative derivative of the loss function with respect to y_hat helps determine what needs to be added to the model to decrease the loss.</li><li>A second weak learner, f_1(x), is trained on the values of the pseudo-residual (r) to create a new model, F_1(x).</li><li>The constant term gamma_0 is multiplied by the result of f_1(x) to adjust how much of f_1(x) should be added to the model to minimize loss.</li><li>Line search is typically used to find the optimal value for gamma_0 that minimizes the loss.</li><li>Creating subsequent models involves repeating the process of calculating the negative derivative, fitting a new weaker learner, and determining the new gamma value.</li><li>The goal is to iteratively improve the model's prediction by adding new weak learners to the ensemble.</li></ul>

Gradient Boosting Algorithm: Expalined

Discover more