Standardize Genotypes

This process standardizes numerically coded genotypes to have a genetic mean of zero and a genetic variance of one. This is useful prior to predictive modeling in order to let each genotype have an equal effect on the response. The process also optionally imputes missing values with zeros.

What do I need?

One Input Data Set, containing all of the marker data is needed for this process. The sample data set used in the following example, the samplegmdata_numgeno data set, is partially shown below.

The original samplegmdata data set described in Data Sets Used in JMP Genomics Processes was computer generated and consists of 1000 rows of individuals with 130 columns corresponding to data on these individuals. Marker data is presented in the two-column allelic format. This data set was recoded using the Recode Genotypes process to generate samplegmdata_numgeno data set. The recoded data set consists of 611 rows with 70 columns. Marker data is presented as numeric variables in the one-column genotypic format.

Note: The marker variables in the input data set must contain numerically coded genotypes; a data set containing these can be obtained by checking the Create data set with numerically coded genotypes option when you run Marker Properties or by running the Recode Genotypes process.

Output/Results

This process generates a new output data set containing the standardized genotypes. Refer to the Standardize Genotypes output documentation for detailed descriptions of the output of this process.