Validation Options in Stepwise RegressionValidation is the process of using part of a data set to estimate model parameters and using the other part to assess the predictive ability of the model.
• The training set is used to estimate model parameters.
• The validation set is used in the model fitting to assess or validate the predictive ability of the model.
• The test set is a final, independent assessment of the predictive ability of the model. The test set is available only when using a validation column.
The training, validation, and test sets are created as subsets of the original data. This is done through the use of a validation column in the Fit Model launch window.
The values in the validation column determine how the data are split and what method is used for validation.
• If the column has only one distinct value, then no validation is performed.
• If the column has two distinct values, then training and validation sets are created.
• If the column has three distinct values, then training, validation, and test sets are created.
• If the column has more than three distinct values, then k-fold cross validation is performed.
When a validation column is used, model fit statistics are given for the training, validation, and test sets in the Stepwise Control report. Models that are run from the control panel include the validation column.
For more information about how a Validation column is used in JMP modeling platforms, see “Validation in JMP Modeling” in Predictive and Specialized Modeling.
Note: There is an option to use k-fold cross validation for stepwise regression models in JMP and JMP Pro. Click the Stepwise Fit red triangle and select K-Fold Crossvalidation.