Impute Missing Genotypes

Processes | Genetics | Impute Missing Genotypes

Impute Missing Genotypes

The Impute Missing Genotypes process imputes numeric missing marker genotypes (0, 1, or 2) for diploid organisms using the k-nearest neighbor imputation (kNNi) or the linkage disequilibrium k-nearest neighbor imputation (LD-kNNi) methods¹. LD between markers is computed (using the SAS PROC ALLELE), distances between samples are computed (using the SAS PROC DISTANCE), and k-nearest neighbor samples is computed (using the SAS PROC MODECLUS)².

What do I need?

One SAS data set is required: An input data set with one column per each numeric coded marker (0 for the homozygous major allele, 1 for the heterozygous, and 2 for the homozygous minor allele).

The second data set is the Annotation Data Set. This data set contains information, such as gene identity or chromosomal location, for each of the markers.

Output/Results

Output from this process is accessed from a Results window. Refer to the Impute Missing Genotypes output documentation for detailed descriptions and guides to interpreting your results.

1

Money D, Gardner K, Migicovsky Z, Schwaninger H, Zhong G-Y, and Myles S. 2015. LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms. G3 5:2383-2390.

2

Refer to the SAS PROC ALLELE, SAS PROC DISTANCE, and SAS PROC MODECLUS documentation for more information.