Process Description

SNP Interaction Discovery

The SNP Interaction Discovery process uses penalized logistic regression (PLR) to search for pairs of SNPs predictive of a binary trait beyond the predictive ability of single SNPs (Park and Hastie, 2008).

PLR incorporates SNPs stepwise, first adding terms up the maximum requested and then sequentially deleting them. The best model is selected from the stepwise sequence. A SNP interaction term competes for inclusion if one of the SNPs is already in the model. The PLR penalty equals lambda times the sum of squares of the coefficients. You can specify a single value of lambda to get the desired list of interactions, or a suitable lambda can be found by cross validating the process over a range of lambdas.

A Forest model is available to pre-select SNPs likely to be part of an interaction. Alternatively, a Forest of trees can create candidate SNP interaction indicators that can then compete with single SNPs for predictive ability in PLR.

Note: This process is considered experimental.

What do I need?

One wide Input Data Set is needed to run the SNP Interaction Discovery process. This data set contains all of the numeric and other data to be analyzed. The samplegmdata_numgeno.sas7bdat data set, used in the following example, consists of 611 rows of individuals with 70 columns corresponding to data on these individuals. It was derived from the original, computer-generated, samplegmdata.sas7bdat data set described in Data Sets Used in JMP Genomics Processes, using the Recode Genotypes process. Marker data is presented in the one-column format. This data set is partially shown below.

For detailed information about the files and data sets used or created by JMP Genomics software, see Files and Data Sets.

Output/Results

The output generated by this process is summarized in a Tabbed report. Refer to the SNP Interaction Discovery output documentation for detailed descriptions and guides to interpreting your results.