Publication date: 05/24/2021

To launch the Model Screening platform, select Analyze > Predictive Modeling > Model Screening.

Figure 10.3 The Model Screening Launch Window

For more information about the options in the Select Columns red triangle menu, see Column Filter Menu in Using JMP.

Y, Response

The response variable or variables that you want to analyze.

X, Factor

The predictor variables.

Weight

(Not applicable to the K Nearest Neighbors, Support Vector Machines, or Neural modeling platforms.) A column whose numeric values assign a weight to each row in the analysis.

Freq

(Not applicable to the K Nearest Neighbors modeling platform.) A column whose numeric values assign a frequency to each row in the analysis.

Validation

(Not applicable if any of the Crossvalidation options are selected in the launch window.) A numeric column that defines the validation sets. If you click the Validation button with no columns selected in the Select Columns list, you can add a validation column to your data table. For more information about the Make Validation Column utility, see Make Validation Column.

Note: If you specify a validation column with more than three levels, this column is used to perform K Fold crossvalidation.

By

A column or columns whose levels define separate analyses. For each level of the specified column, the corresponding rows are analyzed using the other variables that you have specified. The results are presented in separate reports. If more than one By variable is assigned, a separate report is produced for each possible combination of the levels of the By variables.

Methods

Enables you to select the desired modeling platforms. By default, the modeling platforms that are fit are Decision Tree (Partition), Bootstrap Forest, Boosted Tree, K Nearest Neighbors, Neural, Support Vector Machines, Discriminant, Fit Least Squares, Fit Stepwise, Logistic Regression, and Generalized Regression. Naive Bayes, Partial Least Squares, and XGBoost are also available.

Notes:

– XGBoost is not supported by JMP and is available only if the XGBoost add-in is installed. For more information about XGBoost, see https://community.jmp.com.

– Decision Tree (Partition), Discriminant, and Partial Least Squares all require some type of validation set in order to fit a model.

– If there are fewer than 20 observations in a validation set, a Decision Tree (Partition) model cannot be fit.

– The modeling platforms use default options and tuning parameters in model fitting. You can try to improve the fit past what the default yields by calling platforms directly and choosing different options.

Provides additional options.

Remove Live Reports

Does not include the individual model platform reports in the Model Screening report window.

Tip: Select this option to free up memory when you have a large problem with many methods and fits.

Log Methods

Writes out a progress message to the log each time a fitting platform is called.

Time Limit Each

Specifies a time limit, in seconds, for each fit. For platforms that support early stopping, the best estimates up to that point are provided. For platforms that do not support early stopping, no result is provided.

Set Random Seed

Sets a random seed that is used for any random components of the model fit routines. This enables you to rerun the platform and obtain the same model fits.

Provides options for various types of crossvalidation.

K Fold Crossvalidation

Divides the data randomly into K parts or folds. Each model is fit to the data K times, each time with a different fold held out as a crossvalidation set. A total of K models are fit. The default value of K is 5.

– K specifies the number of folds for K Fold Crossvalidation. The default is 5 and K must be greater than 1.

Nested Crossvalidation

Divides the data into nested folds for crossvalidation. First, the data are divided into k = 1, ..., K equals parts, or folds. For each fold, the kth fold is used as a test set and the remaining data are divided further into L equal parts. These L subdivisions are called inner folds. Then, a model is fit to the data L times with a different inner fold held out each time as a crossvalidation set. The L models then use the kth fold as a common test data set. In all, a total of K*L models are fit. The default value of K is 4 and the default value of L is 5.

For example, set K = 2 and L = 3. The data are initially divided into two folds. The first fold is held out as a test set and the second fold is divided into 3 inner folds. Three models are fit to the data, each time with a different inner fold held out as a crossvalidation set. Then, all three models are tested on the first fold.

The second fold is then held out as a test set and the first fold is divided into 3 inner folds. Three models are fit to the data, each time with a different inner fold held out as a crossvalidation set. Then, all three models are tested on the second fold.

– K specifies the number of folds for Nested Crossvalidation. The default is 4 and K must be greater than 1.

– L specifies the number of inner folds for Nested Crossvalidation. The default is 5 and L must be greater than 1.

Note: If both K Fold Crossvalidation and Nested Crossvalidation are selected, Nested Crossvalidation is performed.

Repeated K Fold

Specifies the number of times the K Fold Crossvalidation or Nested Crossvalidation process is repeated.

Provides additional options for the modeling platforms.

Add Two Way Interactions

Adds all two way interaction effects to linear models.

Add Quadratics

Adds effects for the squares of continuous variables to linear models.

Informative Missing

Enables informative missing for all platforms.

Additional Methods

Calls several additional methods, such as Ridge, Elastic Net and Lasso, in the Generalized Regression platform. See Generalized Regression Models in Fitting Linear Models.

Caution: This results in additional model fits.

When you click OK, the specified models are fit and a set of progress bars are shown. The upper progress bar reports the progress across all fits. The lower progress bar reports the progress for the current individual model fit. You can stop the lower progress bar to employ early stopping and the upper progress bar will continue to run.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).