Overview of the Bootstrap Forest Platform

The Bootstrap Forest platform predicts a response value by averaging the predicted response values across many decision trees. Each tree is grown on a bootstrap sample of the training data. A bootstrap sample is a random sample of observations, drawn with replacement. In addition, the predictors are sampled at each split in the decision tree. The decision tree is fit using the recursive partitioning methodology described in Partition Models.

This is the fitting process for the training set:

1. For each tree, select a bootstrap sample of observations.

2. Fit the individual decision tree, using recursive partitioning.

– Select a random set of predictors for each split.

– Continue splitting until a stopping rule that is specified in the Bootstrap Forest Specification window is met.

3. Repeat step 1 and step 2 until the number of trees specified in the Bootstrap Forest Specification window is reached or until Early Stopping occurs.

For an individual tree, the bootstrap sample of observations that is used to fit the tree is drawn with replacement. You can specify the proportion of observations to be sampled. If you specify that 100% of the observations are to be sampled, because they are drawn with replacement, the expected proportion of unused observations is 1/e, or approximately 36.8%. For each individual tree, these unused observations are called the out-of-bag observations. The observations used in fitting the tree are called in-bag observations. For a continuous response, the Bootstrap Forest platform provides measures for the error rate for out-of-bag observations, called out-of-bag error.

For a continuous response, the predicted value for an observation is the average of its predicted values over the collection of individual trees. For a categorical response, the predicted probability for an observation is the average of its predicted probabilities over the collection of individual trees. The observation is classified into the level for which its predicted probability is the highest.

For more information about bootstrap forests, see Hastie et al. (2009).

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).