Decision Trees

What is a decision tree?

A decision tree model, also called a partition model, is a flexible method for developing models for classification and prediction. A decision tree consists of a set of conditional rules that lead to a prediction.

left
blue

How do you build a decision tree?

To build a decision tree, begin by splitting the observed data into two partitions, based on the value of one of the predictors. You find the predictor variable and the split value for this predictor by examining all possible predictors and sorted splits. The chosen predictor variable and split value is the one that results in the most dissimilar predicted probabilities for a categorical response or the most dissimilar predicted response for a continuous response.

Then you repeat the process. In all partitions, select a predictor variable and split value for this predictor that results in the biggest difference in predicted probabilities or mean response. You could continue the process until every split contained just one observation; that model, however, would overfit the data. Instead, you should fit the model on a training set and compare fit statistics on a validation set at each split in the tree. You might choose to stop when the next 10 splits do not produce an improvement in your chosen fit statistic, then prune the tree (combining splits together) back to the best fit statistic on the validation set.

Learn how to fit a decision tree in JMP

https://www.youtube.com/watch?v=rwy54EksLZ4

Figure 1: Diagram of a decision tree model for a binary response with six splits. The two response categories are colored red and blue. The bar graph shows the proportion of observations in the node belonging to each response level.

What are the advantages and disadvantages of decision trees?

Advantages

Disadvantages

Trees can be modified to improve predictions.

Example of a decision tree

Let’s fit a decision tree model on the Recovery data we introduced in our overview of predictive modeling. We’ll look at the continuous response Percent Recovered.

Figure 2: Model performance vs. number of splits for training and validation sets. After 47 splits, no further improvement in R-square on the validation set is seen.

Without considering the validation set, the splitting procedure would finish with one observation in each terminal node of the tree, because R-square increases with model complexity. Instead, the procedure keeps splitting until no further improvement for R-square on the validation set is seen in the next 10 splits. The model is then pruned back (combining nodes together) to the point of maximum R-square.

Figure 3: Actual by predicted plots on training, validation, and test sets for the decision tree model.

Examination of the actual by predicted plot shows the discrete nature of the model. There are 47 splits in this model, so there are 48 predicted values of Percent Recovered.

Figure 4: The column contributions report shows that Valve 17 Pressure explains the most variability in Percent Recovered, followed by Vendor Code and Shrinkage.

What is a random decision forest?

A random decision forest or bootstrap forest model is an ensemble model, or a model that averages smaller models. To create a random decision forest, start by drawing a bootstrap sample (i.e., a random sample with replacement) from your data set. Fit a small decision tree to this sample, but only considering a subset of the predictor variables, again with replacement. Repeat the bootstrap sampling and small tree fitting many times. This process creates a forest of many small trees. Create the final model by averaging the predictions of the trees in the forest.

The resampling from both the data and the predictors is a key part of the random decision forest and is done to create diverse trees. Each small tree might not predict or classify well. However, when all of the trees are combined into a forest, the combined model can predict very well.

Random decision forests are not easy to interpret or explain, but you might not care about interpretation if prediction is your main goal of analysis. Random decision forests can be used for explanatory modeling or as a first step in building other predictive models. You can use a random decision forest model to find important predictors, then use just those chosen predictors in a linear model or neural network, for example.

Boosted trees and boosted forests

Gradient boosting can be applied to both decision trees and random decision forests to improve the predictive power of the model.

Boosting is the process of building a large, additive decision tree by fitting a sequence of smaller trees. At each stage, the tree is grown on the scaled residuals from the previous stage. The magnitude of scaling is controlled by the learning rate.