The Missing Value Report

A low-rank approximation of a matrix is of the form X = UDV‘ and can be viewed as an extension of singular value decomposition (SVD). ADI uses the Soft-Impute method as the imputation model and is designed such that the data determines the rank of the low-rank approximation.

The ADI algorithm performs the following steps:

The data are partitioned into training and validation sets.

Each set is centered and scaled using the observed values from the training set.

For each partitioned data set, additional missing values are added within each column and are referred to as induced missing (IM) values.

The imputation model is fit on the training data set along a solution path of tuning parameters. The IM values are used to determine the best value for the tuning parameter.

Additional rank reduction is performed using the training data set by de-biasing the results from the chosen imputation model in step 4.

Final rank reduction is performed to calibrate the model for streaming data and to prevent overfitting. This is done by fitting the imputation model on the validation set, using the rank determined in step 5 as an upper bound.

Automated Data Imputation Controls

The ADI utility contains options for saving the imputed values and advanced controls.

Figure 2.14 ADI Controls

Options for Saving Imputed Values

The following three options for saving the imputed values for the ADI method are available:

Create New Data Table

Creates a new data table that has the same dimensions as the original data table. In the new data table, the columns selected in the launch window contain the imputed values.

Save Scoring Formula to Current Data Table

Saves a column group, named Imputed_, to the current data table that contains the imputed columns specified in the launch window. A hidden column, ADI Impute Column, is also added to the current data table that contains the imputed vectors and the scoring formula used in the data imputation. The column formulas automatically update if any additional rows are added to the data table, enabling missing data imputation for streaming data. This is the default option.

Impute Values in Place

Imputes the missing values in the current data table. The imputed values are displayed in light blue.

Advanced Controls

Contains the following advanced controls, with recommended settings based on the data:

Dimension Upper Bound

Determines the maximum rank allowed in the low-rank approximation. This is determined by the dimension of the matrix formed by the chosen columns.

Maximum Iterations

Determines the number of values that are iterated over to determine the tuning parameter for the imputation model. The default is 10.

Proportion of Observations to Induce as Missing

Determines the proportion of IM values that are added to the training and validation sets. The default proportion for each set is 0.2.

Proportion of Rows to Use for Validation

Determines the proportion of rows to use in the training and validation sets. The default proportion for the validation set is 0.3.

Set Random Seed

Determines the random seed for ADI. Use this option to obtain reproducible results.