Publication date: 07/30/2020

Launch the Bootstrap Forest platform by selecting Analyze > Predictive Modeling > Bootstrap Forest.

Figure 5.7 Bootstrap Forest Launch Window

For more information about the options in the Select Columns red triangle menu, see Column Filter Menu in Using JMP.

The Bootstrap Forest platform launch provides the following options:

Y, Response

The response variable or variables that you want to analyze.

X, Factor

The predictor variables.

Weight

A column whose numeric values assign a weight to each row in the analysis.

Freq

A column whose numeric values assign a frequency to each row in the analysis.

Validation

A numeric column that contains at most three distinct values. See Validation in the Partition Models section.

By

A column or columns whose levels define separate analyses. For each level of the specified column, the corresponding rows are analyzed using the other variables that you have specified. The results are presented in separate reports. If more than one By variable is assigned, a separate report is produced for each possible combination of the levels of the By variables.

Method

Enables you to select the partition method (Decision Tree, Bootstrap Forest, Boosted Tree, K Nearest Neighbors, or Naive Bayes). These alternative methods, except for Decision Tree, are available in JMP Pro.

For more information about these methods, see Partition Models, Boosted Tree, K Nearest Neighbors, and Naive Bayes.

Validation Portion

The portion of the data to be used as the validation set. See Validation in the Partition Models section.

Informative Missing

If selected, enables missing value categorization for categorical predictors and informative treatment of missing values for continuous predictors. See Informative Missing in the Partition Models section.

Ordinal Restricts Order

If selected, restricts consideration of splits to those that preserve the ordering.

After you select OK in the launch window, the Bootstrap Forest Specification window appears.

Figure 5.8 Bootstrap Forest Specification Window

Number of Rows

The number of rows in the data table.

Number of Terms

The number of columns that are specified as predictors.

Number of Trees in the Forest

The number of trees to grow and then average.

Number of Terms Sampled per Split

The number of predictors to consider as splitting candidates at each split. For each split, a new random sample of predictors is taken as the candidate set.

Bootstrap Sample Rate

The proportion of observations to sample (with replacement) for growing each tree. A new random sample is generated for each tree.

Minimum Splits Per Tree

The minimum number of splits for each tree.

Maximum Splits Per Tree

The maximum number of splits for each tree.

Minimum Size Split

The minimum number of observations needed on a candidate split.

Early Stopping

(Available only if validation is used.) If selected, the process stops growing additional trees if the additional trees do not improve the validation statistic. The validation statistic is the validation set’s Entropy RSquare value for a categorical response and its RSquare value for a continuous response. If not selected, the process continues until the specified number of trees is reached.

Multiple Fits over Number of Terms

If selected, creates a bootstrap forest for several values of number of terms sampled per split. The model for which results are displayed is the model whose Validation Set’s Entropy RSquare value (for a categorical response) or RSquare (for a continuous response) is the largest.

The lower bound is the Number of Terms Sampled per Split specification. The upper bound is specified by the following option:

Max Number of Terms

The maximum number of terms to consider for a split.

Use Tuning Table Design

Opens a window where you can select a data table containing values for the Forest panel tuning parameters, called a tuning design table. A tuning design table has a column for each option that you want to specify and has one or multiple rows that each represent a single Bootstrap Forest model design. If an option is not specified in the tuning design table, the default value is used.

For each row in the table, JMP creates a Bootstrap Forest model using the tuning parameters specified. If more than one model is specified in the tuning design table, the Model Validation-Set Summaries report lists the RSquare value for each model. The Bootstrap Forest report shows the fit statistics for the model with the largest RSquare value.

You can create a tuning design table using the Design of Experiments facilities. A bootstrap forest tuning design table can contain the following case-insensitive columns in any order:

– Number Trees

– Number Terms

– Portion Bootstrap

– Minimum Splits per Tree

– Maximum Splits per Tree

– Minimum Size Split

Suppress Multithreading

If selected, all calculations are performed on a single thread.

Random Seed

Specify a nonzero numeric random seed to reproduce the results for future launches of the platform. By default, the Random Seed is set to zero, which does not produce reproducible results. When you save the analysis to a script, the random seed that you enter is saved to the script.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).