•
|
The training set is the part that estimates model parameters.
|
•
|
The validation set is the part that assesses or validates the predictive ability of the model.
|
•
|
The test set is a final, independent assessment of the model’s predictive ability. The test set is available only when using a validation column (see Descriptions of Launch Window).
|
The training, validation, and test sets are created by subsetting the original data into parts. Validation Methods describes several methods for subsetting a data set.
Randomly divides the original data into the training and validation data sets. The Validation Portion (see Descriptions of Launch Window) on the platform launch window is used to specify the proportion of the original data to use as the validation data set (holdback).
|
|
Randomly divides the original data into K subsets. In turn, each of the K sets is used to validate the model fit on the rest of the data, fitting a total of K models. The final model is selected based on the cross-validation RSquare, where a constraint is imposed to avoid overfitting the model. This method is useful for small data sets, because it makes efficient use of limited amounts of data. See KFold Crossvalidation.
Note: KFold validation is available only with the Decision Tree method. To use KFold, select K Fold Crossvalidation from the platform red-triangle menu, see Platform Options.
|
|
When you select the K Fold Crossvalidation option, an outline called Crossvalidation appears. The results in this outline update as you split the decision tree. Of, if you click Go, the outline shows the results for the final model. See Crossvalidation Report.
In KFold crossvalidation, the entire set of observations is partitioned into K subsets, called folds. Each fold is treated as a holdback sample and the remaining observations serve as a training set.