Processes | Annotation Analysis | Gene Set Scoring

Gene Set Scoring
The Gene Set Scoring process transforms a tall data set with genes as rows into a tall data set with pathways or categories as rows. The output data set consists of relevance scores for each pathway or category, computed based on individual deviations from a reference value for each gene. This data set can then be used as input to processes from the Quality Control , Pattern Discovery , Row-by-Row Modeling , and Predictive Modeling menus in order to perform category-based inference.
What do I need?
Two input data sets are required to successfully run the Gene Set Scoring process:
The Input Data Set that must contain a binary or continuous significance variable and an annotation column. The affylatin_norm_probeset.sas7bdat input data set, shown below, contains normalized expression data from Affymetrix Latin Square Data , and serves as an example. The input data set contains the Probe_Set_ID column and 59 sample columns. Each row represents one probeset . Each sample column lists intensity data for the 100 probesets.
The Experimental Design Data Set (EDDS) . The affylatin_exp.sas7bdat EDDS, shown below, serves as an example. This required data set tells how the experiment was performed, providing information about the columns in the input data set. Note that one column in the EDDS must be named ColumnName and the values contained in this column must exactly match the column names in the input data set. Two other columns in this data set, Array and Experiment , correspond to an index variable and the one-way experimental variable, respectively.
A third data set. the Annotation Data Set , is optional unless the input data set contains no annotation data. This data set contains information such as gene identity or chromosomal location, for each of the markers. The u95a_trim.sas7bdat annotation data set identifies identities and biological processes for the genes targeted by the probesets. A portion of this data set is illustrated below. This data set is a tall data set; each row corresponds to a different marker.
The affylatin_norm_probeset.sas7bdat input data set, affylatin_exp.sas7bdat EDDS, and u95a_trim.sas7bdat annotation data set are included in the JMP Genomics Sample Data folder.
For detailed information about the files and data sets used or created by JMP Life Sciences software, see Files and Data Sets .
Output/Results
Refer to the Gene Set Scoring output documentation for detailed descriptions of the output of this process.