Process Description

Missing Value Imputation

One of the problems complicating the analysis of genomic data sets is the prevalence of missing values.

The Missing Value Imputation process replaces missing values in a data matrix with values computed from nonmissing values in the same row. Imputation is performed rowwise. That is, new imputation statistics are computed for each row in the input data set. You can optionally define groups of columns so that imputation is performed groupwise within each row.

What do I need?

Two data sets are required for this process.

The first data set, the Input Data Set, contains all of the numeric data. This data set must be in the tall format where each sample corresponds to one row and each column corresponds to a separate experimental condition or array.

The second data set is the Experimental Design Data Set (EDDS). This required data set tells how the experiment was performed, providing information about the columns in the input data set. Note that one column in the EDDS must be named ColumnName and the values contained in this column must exactly match the column names in the input data set.

For detailed information about the files and data sets used or created by JMP Genomics software, see Files and Data Sets.

Output/Results

The output of the Missing Value Imputation process includes one data set (with the suffix _mvi appended) containing both the existing input values and the imputed values.