Launch the Marker Imputation Platform

To impute the genotypes of the markers missing from your experimental data, launch the Marker Imputation platform by selecting Analyze > Life Sciences > Marker Imputation.

Figure 7.4 The Marker Imputation Launch Window

The Marker Imputation Launch Window

For information about the options in the Select Columns red triangle, see “Column Filter Menu”in Using JMP.

Marker

Select the marker columns that you want and click Marker to specify the markers that you want to impute.

Produces a separate report for each level of the By variable. If more than one By variable is assigned, a separate report is produced for each possible combination of the levels of the By variables.

Ploidy

Enables you to specify the ploidy level of the experimental organism under investigation. Note that this must be an even number

Missing Marker Imputation Method

Use this option to specify how missing marker values are to be imputed.

– Select LD-kNN to impute the missing genotypes using the linkage disequilibrium k-nearest neighbor imputation (LD-kNNi) method.

The first step is to carry out a pairwise comparison between each pair of markers. The markers are ranked from most closely to least closely correlated and the x closest markers are selected. The number of markers (x) is specified by the value entered in the Nearest Markers option. This is done for each marker. The second step is to carry out a pairwise comparison between each pair of sample. The samples are ranked from most closely to least closely, using the taxi cab distance, and the number of samples (k) is specified by the specified value in the "Nearest Samples" option. For each missing sample, the closest x markers and closest k samples are determined and used to impute the missing values by taking the mode of closest markers and samples.The missing marker is imputing by taking the mode of the closest markers and samples. This is repeated for each missing marker.

– Select HWE Off to impute the missing genotypes with random draws from a multinomial distribution in which the frequency of each genotype class is set to be the observed frequency from the data.

– Select HWE On to impute the missing genotypes with random draws from a multinomial distribution in which the frequency of each genotype class is set to be the expected frequency under the assumption of the Hardy-Weinberg equilibrium (HWE).

– Select Random to randomly assign one of the acceptable values (0, 1, 2, ..., K) where K is the ploidy level.

– Select Specified to impute the missing genotypes with a specified integer between zero and the ploidy number.

Nearest Samples

Use this option to specify the number of closest samples to use for the imputation.

Nearest Markers

Use this option to specify the number of closest markers to use for the imputation.

Set Random Seed

Use this option to specify a nonnegative integer to start the random number stream. Different values produce different outcomes of the algorithm.

Imputation Value

Use this option to specify a value to insert into any cell containing a missing value symbol.

Unthreaded

Use this option to suppress multi-threading. Deselect this option for improved computational speed.

Data Format

Most of the processes in JMP assume that the input table has a particular data structure. JMP distinguishes between tall and wide data sets. A tall data table has samples as columns and molecular entity (for example, marker, gene, clone, protein, or metabolite) as rows, whereas a wide data table is the transpose of the tall data table, having the samples as rows and molecular entity as columns.

When specifying the input data set for a process, it is important to know the required form. Marker Imputation requires a wide data table. The Transpose platform under the Tables menu enables you to transform your data tables between tall and wide forms.

Marker data must be encoded in the one-column, numeric format. Typically, in this format, diploid individuals homozygous for the least common, or minor allele, are represented in the table by a 2, whereas the heterozygotes are represented by a 1. Homozygotes for the most common allele are represented by a 0.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).