Multiple SNP-Trait Association

The Multiple SNP-Trait Association process tests for association between various types of traits and numerically coded genotypes from multiple SNPs at a time using logistic, linear, or survival regression models, as well as generalized linear mixed models on SNP genotypes themselves or principal components used to represent the SNP genotypes. These methods allow adjustments to be made for quantitative covariates and random effects or for some trait types, strata variables . For binary traits , Hotelling's T-squared test can be performed as well or instead. For binary or continuous traits , the Sequence Kernel Association Test (SKAT) is also an option.

The Annotation Analysis Group Variable in the Annotation Data Set identifies the SNPs to be included in the same model together, and models are fit and statistics reported for each unique value of this variable (typically a gene) within an annotation group (typically a chromosome ). P-values from these tests, with adjustments applied if requested, are plotted along the marker map, using the location of the first SNP in each annotation analysis group.

See the MIXED , GLIMMIX , LOGISTIC , and PHREG procedures in the SAS/STAT User's Guide for more information.

What do I need?

Two data sets are needed for this process. The first required data set, the Input Data Set , contains all of the marker data. The sample data set used in the following example, the morocco_snps1exp_rg.sas7bdat data set, represents the data from a study of gene expression variation and SNP associations in southern Morocco (Idaghdour, Czika, et al ., 2010) that has been recoded to numeric genotypes using the Recode Genotypes process.

Note : The data was modified slightly to preserve anonymity of subjects.

The morocco_snps1exp_rg.sas7bdat data set lists genotype data at 4744 SNPs in 193 individuals. Marker data is presented in the one-column format. This data set is partially shown below. Note that this is a wide data set; markers are listed in columns, whereas individuals are listed in rows.

The second required data set is the Annotation Data Set . This data set contains information, such as gene identity or chromosomal location, for each of the markers. The morocco_anno_rg.sas7bdat annotation data set is used in this example. A portion of this data set is illustrated below. This data set is a tall data set; each row corresponds to a different marker.

Note : The top-to-bottom order of the rows in the annotation data set matches the left-to-right order of the columns in the input data set. This correspondence is required for markers to be matched appropriately.

Both data sets are included in the Sample Data folder that comes with JMP Genomics.

For detailed information about the files and data sets used or created by JMP Life Sciences software, see Files and Data Sets .

Output/Results

The output generated by this process is summarized in a Tabbed report. Refer to the Multiple SNP-Trait Association output documentation for detailed descriptions and guides to interpreting your results.