Boosting Predictive Models

What is gradient boosting?

Gradient boosting is a method for improving the predictive power of a model. Boosting is the process of building a large model by fitting a sequence of smaller models. At each stage, the model is built on the residuals from the previous stage. Since the models are small, early iterations have a lot of unexplained variance left over at the end of each step. We gain improvement through each iteration, and it’s a much better improvement than if we had just fit a big model from the start.

Boosted decision tree or random forest

To fit a boosted decision tree model, first fit the simplest model – the predicted values are the mean of the (continuous) response. Next, find the residuals from the model fit. The residuals from a model should contain only the noise, no signal. This early in the process, the residuals will contain lots of signal and some noise. We want to learn slowly, so we’ll multiply the residuals by the learning rate, which is a small number, usually no greater than 0.1. After the first step, the predicted value of Y is simply the mean of Y.

Next, build a short tree on the scaled residuals. We want a short tree because, again, we don’t want to explain all the variability early. Fit the model and find the residuals from that model. Then scale those residuals. You’ll continue this process of fitting a short tree to scaled residuals from the previous step until no more predictive improvement is found.

Cartoon of the gradient boosting process on a decision tree model

The predictions at step m have a nice form: $\hat{y} = \bar{y} + \sum_{i=1}^{m} \nu \beta_i T_i(x)$. They are the mean plus the learning rate $\nu$ times a weighted sum of the residuals T from the previous step.

The smaller model could be a decision tree or random forest. Let’s fit a boosted random forest model to the Percent Recovered response in the Recovery data (using the JMP data set found in our introduction to predictive modeling).

Specification of parameters for a boosted random forest.

Model fit statistics for a decision tree model (top) and boosted random forest model (bottom). R-square on the validation set is higher for the boosted random forest (0.819) than the decision tree (0.633).

Boosted neural network

Gradient boosting can be used to improve the predictive power of neural network models as well. At each stage, you can build a neural model on the scaled residuals from the previous stage.

Let’s fit a boosted neural model to the Percent Recovered response in the Recovery data.

Model specification for a boosted neural network.

Model fit statistics for a neural network (top) and boosted neural network (bottom). R-square on the validation set is larger for the boosted neural network (0.799) than the neural network (0.760).