Model Validation-Set Summaries

For the latest version of JMP Help, visit JMP.com/help.

Predictive and Specialized Modeling > Bootstrap Forest > The Bootstrap Forest Report > Model Validation-Set Summaries

Publication date: 04/30/2021

Model Validation-Set Summaries

(Available when you select the Multiple Fits over Number of Terms option in Bootstrap Forest Specification window.) Provides fit statistics for all the models fit. See Figure 5.10 and Multiple Fits Panel.

Specifications

Shows the settings used in fitting the model.

Overall Statistics

Provides fit statistics for the training set, and for the validation and test sets if they are specified. The specific form of the report depends on the modeling type of the response.

Suppose that multiple models are fit using the Multiple Fits over Multiple Terms option in the Bootstrap Forest Specification window. Then the model for which results are displayed in the Overall Statistics and Cumulative Validation reports is the model for which the validation set’s Entropy RSquare value (for a categorical response) or RSquare (for a continuous response) is the largest.

Categorical Response

Measures Report

Gives the following statistics for the training set, and for the validation and test sets if they are specified.

Note: For Entropy RSquare and Generalized RSquare, values closer to 1 indicate a better fit. For Mean -Log p, RMSE, Mean Abs Dev, and Misclassification Rate, smaller values indicate a better fit.

Entropy RSquare

A measure of fit that compares the log-likelihoods from the fitted model and the constant probability model. It ranges from 0 to 1. See Entropy RSquare in the Statistical Details section.

Generalized RSquare

A measure that can be applied to general regression models. It is based on the likelihood function L and is scaled to have a maximum value of 1. The value is 1 for a perfect model, and 0 for a model no better than a constant model. The Generalized RSquare measure simplifies to the traditional RSquare for continuous normal responses in the standard least squares setting. Generalized RSquare is also known as the Nagelkerke or Craig and Uhler R2, which is a normalized version of Cox and Snell’s pseudo R2.

Mean -Log P

The average of negative log(p), where p is the fitted probability associated with the event that occurred.

RMSE

The root mean square error, adjusted for degrees of freedom. The differences are between 1 and p, the fitted probability for the response level that actually occurred.

Mean Abs Dev

The average of the absolute values of the differences between the response and the predicted response. The differences are between 1 and p, the fitted probability for the response level that actually occurred.

Misclassification Rate

The rate for which the response category with the highest fitted probability is not the observed category.

The number of observations.

Confusion Matrix

(Available only for categorical responses.) Shows classification statistics for the training set, and for the validation and test sets if they are specified.

Decision Matrix

(Available only for categorical responses and if the response has a Profit Matrix column property or if you specify costs using the Specify Profit Matrix option.) Gives Decision Count and Decision Rate matrices for the training set, and for the validation and test sets if they are specified. See Additional Examples of Partitioning in the Partition Models section.

Continuous Response

Individual Trees Report

Gives RMSE values, which are averaged over all trees, for In Bag and Out of Bag observations. Training set observations that are used to construct a tree are called in-bag observations. Training observations that are not used to construct a tree are called out-of-bag (OOB) observations.

For each tree, the Out of Bag RMSE is computed as the square root of the sum of squared errors divided by the number of OOB observations. The squared Out of Bag RMSE for each tree is given in the Per-Tree Summaries report as OOB SSE/N.

RSquare and RMSE Report

Gives Rsquare, root mean square error, and the number of observations for the training set, and for the validation and test sets, if they are defined.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).