Assess Variable Importance

For continuous responses, the Variable Importance report calculates indices that measure the importance of factors in a model in a way that is independent of the model type and fitting method. The fitted model is used only in calculating predicted values. The method estimates the variability in the predicted response based on a range of variation for each factor. If variation in the factor causes high variability in the response, then that effect is important relative to the model.

Assess Variable Importance can also be accessed in the Profiler that is obtained through the Graph menu.

For statistical details, see Assess Variable Importance. See also Saltelli, 2002.

The Assess Variable Importance Report

The Assess Variable Importance menu has the following options that address the methodology used in constructing importance indices:

Independent Uniform Inputs

For each factor, Monte Carlo samples are drawn from a uniform distribution defined by the minimum and maximum observed values. Use this option when you believe that your factors are uncorrelated and that their likely values are uniformly spread over the range represented in the study.

Independent Resampled Inputs

For each factor, Monte Carlo samples are obtained by resampling its set of observed values. Use this option when you believe that your factors are uncorrelated and that their likely values are not represented by a uniform distribution.

Dependent Resampled Inputs

Factor values are constructed from observed combinations using a k-nearest neighbors approach, in order to account for correlation. This option treats observed variance and covariance as representative of the covariance structure for your factors. Use this option when you believe that your factors are correlated. Note that this option is sensitive to the number of rows in the data table. If used with a small number of rows, the results can be unreliable.

Linearly Constrained Inputs

For each factor, Monte Carlo samples are drawn from a uniform distribution over a region defined by linear constraints. The linear constraints may be defined in the Profiler or constructed in connection with a designed experiment. In addition, the samples are restricted to fall within the minimum and maximum observed values. Use this option in the presence of linear constraints, when you believe that these constraints impact the distribution of the inputs.

The speed of these algorithms depends on the model evaluation speed. In general, the fastest option is Independent Uniform Inputs and the slowest is Dependent Resampled Inputs. You have the option to Accept Current Indices when the estimation process is unable to complete instantaneously.

Note: In the case of independent and linearly constrained inputs, variable importance indices are constructed using Monte Carlo sampling. For this reason, you can expect some variation in importance index values from one run to another.

Variable Importance Report

Each Assess Variable Importance option presents a Summary Report and Marginal Model Plots. When the Assess Variable Importance report opens, the factors in the Profiler are reordered according to their Total Effect importance indices. When there are multiple responses, the factors are reordered according to the Total Effect importance indices in the Overall report. When you run several Variable Importance reports, the factors in the Profiler are ordered according to their Total Effect indices in the most recent report.

Summary Report

For each response, a table displays the following elements:

Column

The factor of interest.

Main Effect

An importance index that reflects the relative contribution of that factor alone, not in combination with other factors.

Total Effect

An importance index that reflects the relative contribution of that factor both alone and in combination with other factors. The Total Effect column is displayed as a bar chart. See Weights.

Main Effect Std Error

The Monte Carlo standard error of the Main Effect’s importance index. This is a hidden column that you can access by right-clicking in the report and selecting Columns > Main Effect Std Error. By default, sampling continues until this error is less than 0.01. Details of the calculation are given in Variable Importance Standard Errors. (Not available for Dependent Resampled Inputs option.)

Total Effect Std Error

The Monte Carlo standard error of the Total Effect’s importance index. This is a hidden column that you can access by right-clicking in the report and selecting Columns > Total Effect Std Error. By default, sampling continues until this error is less than 0.01. Details of the calculation are given in Variable Importance Standard Errors. (Not available for Dependent Resampled Inputs option.)

Weights

A plot that shows the Total Effect indices, located to the right of the final column. You can deselect or reselect this plot by right-clicking in the report and selecting Columns > Weights.

Proportion of function evaluations with missing values

The proportion of Monte Carlo samples for which some combination of inputs results in an inestimable prediction. When the proportion is nonzero, this message appears as a note at the bottom of the table.

Note: When you have more than one response, the Summary Report presents an Overall table followed by tables for each response. The importance indices in the Overall report are the averages of the importance indices across all responses.

Marginal Model Plots

The Marginal Model Plots report (see Marginal Model Plots for Four Responses) shows a matrix of plots, with a row for each response and columns for the factors. The factors are ordered according to the size of their overall Total Effect importance indices.

For a given response and factor, the plot shows the mean response for each factor value, where that mean is taken over all inputs to the calculation of importance indices. These plots differ from profiler plots, which show cross sections of the response. Marginal Model Plots are useful for assessing the main effects of factors.

Note that your choice of input methodology impacts the values plotted on marginal model plots. Also, because the plots are based on the generated input settings, the plotted mean responses might not follow a smooth curve.

The red triangle options enable you to show or hide the following aspects of the plots:

Estimate

A smoothed estimate of the mean of the simulated values calculated as a function of the factor values.

Note: The estimates of the mean are simulated, so the values change when you rerun the analysis.

Confidence Interval

A 95% confidence band for the simulated means. This band is often narrow and might not be visible unless you expand the scale. Not available for Dependent Resampled Inputs.

Note: The confidence bounds are simulated, so the bands change when you rerun the analysis.

Data

The actual (unsimulated) values of the response plotted against the factor values.

Variable Importance Options

The Variable Importance report has the following red triangle options:

Reorder factors by main effect importance

Reorders the cells in the Profiler in accordance with the importance indices for the main effects (Main Effect).

Reorder factors by total importance

Reorders the cells in the Profiler in accordance with the total importance indices for the factors (Total Effect).

Colorize Profiler

Colors cells in the profiler by Total Effect importance indices using a red to white intensity scale.

Note: You can click rows in the Summary Report to select columns in the data table. This can facilitate further analyses.

Examples

A Neural Network Example

The Boston Housing.jmp sample data table contains data on 13 factors that might relate to median home values. You will fit a model using a neural network. Because neural networks do not accommodate formal hypothesis tests, these tests are not available to help assess which variables are important in predicting the response. However, for this purpose, you can use the Assess Variable Importance profiler option.

Note that your results will differ from, but should resemble, those shown here. There are two sources of random variability in this example. When you fit the neural network, k-fold cross validation is used. This partitions the data into training and validation sets at random. Also, Monte Carlo sampling is used to calculate the factor importance indices.

1.	Select Help > Sample Data Library and open Boston Housing.jmp.

2.	Select Analyze > Modeling > Neural.

3.	Select mvalue from the Select Columns list and click Y, Response.

4.	Select all other columns from the Select Columns list and click X, Factor.

Click OK.

6.	In the Neural Model Launch panel, select KFold from the list under Validation Method.

When you select KFold, the Number of Folds defaults to 5.

Click Go.

8.	From the red triangle menu for the Model NTanH(3) report, select Profiler.

The Prediction Profiler is displayed at the very bottom of the report. Note the order of the factors for later comparison.

Because the factors are correlated, you take this into account by choosing Dependent Resampled Inputs as the sampling method for assessing variable importance.

9.	From the red triangle menu next to Prediction Profiler, select Assess Variable Importance > Dependent Resampled Inputs.

The Variable Importance: Dependent Resampled Inputs report appears (Dependent Resampled Inputs Report). Check that the Prediction Profiler cells have been reordered by the magnitude of the Total Effect indices in the report. In Dependent Resampled Inputs Report, check that the Total Effect importance indices identify rooms and lstat as the factors that have most impact on the predicted response.

Dependent Resampled Inputs Report

You might be interested in comparing the importance indices obtained assuming that the factors are correlated, with those obtained when the factors are assumed independent.

10.	From the red triangle menu next to Prediction Profiler, select Assess Variable Importance > Independent Resampled Inputs.

The resampled inputs option makes sense in this example, because the distributions involved are not uniform. The Variable Importance: Independent Resampled Inputs report is shown in Independent Resampled Inputs Report. Check that the two factors identified as having the most impact on the predicted values are lstat and rooms. Note that the ordering of their importance indices is reversed from the ordering using Dependent Resampled Inputs.

Independent Resampled Inputs Report

Variable Importance for Multiple Responses

The data in the Tiretread.jmp sample data table are the result of a designed experiment where the factors are orthogonal. For this reason, you use importance estimates based on independent inputs. Suppose that you believe that, in practice, factor values vary throughout the design space, rather than assume only the settings defined in the experiment. Then you should choose Independent Uniform Inputs as the sampling scheme for your importance indices.

1.	Select Help > Sample Data Library and open Tiretread.jmp.

2.	Run the script RSM for 4 Responses.

The Prediction Profiler is displayed at the very bottom of the report.

3.	From the red triangle menu next to Prediction Profiler, select Assess Variable Importance > Independent Uniform Inputs.

The Summary Report is shown in Summary Report for Four Responses. Because the importance indices are based on random sampling, your estimates might differ slightly from those shown in the figure.

The report shows tables for each of the four responses. The Overall table averages the factor importance indices across responses. The factors in the Profiler (Profiler for Four Responses) have been reordered to match their ordering on the Overall table’s Total Effect importance.

Summary Report for Four Responses

4.	From the red triangle menu next to Variable Importance: Independent Uniform Inputs, select Colorize Profiler.

Colors from a red to white intensity scale are overlaid on profiler panels to reflect Total Effect importance. For example, you easily see that the most important effect is that of Silane on Hardness.

Profiler for Four Responses

The Marginal Model Plots report (Marginal Model Plots for Four Responses) shows mean responses for each factor across a uniform distribution of settings for the other two factors.

Marginal Model Plots for Four Responses