Output | Genetics | PCA for Population Stratification

PCA for Population Stratification
Running this process using the GeneticMarkerExample sample setting generates the tabbed Results window shown below. Refer to the PCA for Population Stratification process description for more information. Output from the process is organized into tabs. Each tab contains one or more plots, data panels, data filters, and so on. that facilitate your analysis.
Tabs
This pane enables you to access and view the output plots and associated data sets on each tab. Use the drop-down menu to view the tab in the Tab Viewer pane, open the tab in a new window, or remove the tab and its contents from the Tab Viewer pane.
The following tabs are generated by this process:
PCA 2D Row Scores: The PCA 2D Rows Scores tab shows 2-D scatterplot matrix of correlations between principal components and other visualizations of the PCs, with points/lines colored and/or labeled by the Color and Label Variables, respectively. This tab is generated only when the Display principal components plots check box has been checked
PCA 3D Row Scores: The PCA 3D Rows Scores tab shows a 3-D scatterplot of three of the principal components, with points colored and/or labeled by the Color and Label Variables, respectively. This tab is generated only when the Display principal components plots check box has been checked
Scree Plot: This tab displays a plot of the eigenvalue for the ith component versus i to show the proportion of variation explained by the principal components. This tab is generated only when the Display principal components plots check box has been checked
Summary Chart: When there are multiple annotation groups (chromosomes or genes, for example), this tab displays the number of significant markers in each annotation group for each test. Separate bar charts are shown for each BY group when any BY variables are specified. This tab is open by default.
Manhattan Plot: When there are multiple annotation groups (chromosomes or genes, for example), this tab displays a scatter plot of the p-values across all annotation groups.
All P-Value Plots: When there are multiple annotation groups (chromosomes or genes, for example), a separate Results tab with an overlay plot of p-value by chromosome location is created for each annotation group. If the Calculate trend odds ratios check box was checked, this tab also contains a Volcano Plot of p-value by log odds ratio for all markers.
In this example, there are two annotation groups (CandGene 1, and CandGene 2) and, thus two Annotation Group Results tabs (CandGene 1 Results and CandGene 2 Results).
All P-Value Plots: When there are multiple annotation groups (chromosomes or genes, for example), the All P-Value Plots tab shows all the p-value plots from the Annotation Group Results tabs in a single display.
Note: When an annotation group variable is not specified or there is only one annotation group, the tab is named P-Value Plot and contains an overlay plot of p-value by chromosome location for all markers.
All Trends Odds Ratio Plots: : If the Calculate trend odds ratioss check box was checked and there are multiple annotation groups (chromosomes or genes, for example), this tab shows all the odds ratio volcano plots.
Volcano Plot(s): This tab displays a scatter plot of p-value by the Estimate of Minor Allele Genotype Effect for all markers, colored by Annotation Group, when the trend test is performed. When the Output genotype LS means and diffs box is checked, this tab includes scatter plots of p-value by the LS diffs between genotypes 0 and 1, and genotypes 0 and 2.
SAS Output : This is a text-based output directly from SAS/STAT PROC PRINCOMP and provides detailed statistics on the principal components analysis. Refer to the documentation for SAS PROC PRINCOMP for more information.
Drill Downs
Action buttons provide you with an easy way to drill down into your data. The following action buttons are generated by this process:
Create Subset Genotype and Annotation Data Sets: Select points from the p-value plots and click Create Subset Genotype and Annotation Data Sets to open the Subset and Reorder Genetic Data process to create the subset data sets.
Note: This action button is not available if any By Variables are selected.
Plot Trait by Genotype: Select markers from the p-value plots and click Plot Trait by Genotype to view each marker's genotype distribution for each of the Trait Variables values.
Note: This action button is available only when numeric genotypes are specified.
View Venn Diagram of Significant Markers by Trait for the Test Below: Click either Genotype or Trait to view a Venn diagram showing significant association between markers and multiple traits as determined by the specific association test.
Note: This option is available only when two or more Trait Variables are specified.
Output Data
This process generates the following data set(s):
PCA Data Set: This data set contains the eigenvectors for each of the principal components. The name of this data is set is given by the Output File Prefix, or input data set name if none given, with the suffix _pca. Click Open to view the data set.
EigenCorr Data Set: this data set contains the correlation statistics between each principal component and trait variable and is generated when the Perform EigenCorr to select PCs check box is checked. The name of this data is set is given by the Output File Prefix, or input data set name if none given, with the suffix _pce. Click Open to view the data set.
Merged Data Set: When the Create merged PCA output data set check box is checked, this data set contains the columns from the PCA output data set merged with the input data set. The name of this data is set is given by the Output File Prefix, or input data set name if none given, with the suffix _pcm.
Trend Parameter Estimate Data Set: This data set contains the estimates and test statistics for the fixed effects included in each regression model testing for association, including the numeric marker genotype treated as a continuous variable, and is generated when the Trend test is performed. The name of this data is set is given by the Output File Prefix, or input data set name if none given, with the suffix _pet. Click Open to view the data set.
P-value Data Set: This data set contains all the columns from the annotation data set, plus the test statistics and p-values from the tests performed. This data set can be used as the annotation data set for subsequent processes to accumulate results from multiple processes into a single data set. The name of this data is set is given by the Output File Prefix, or input data set name if none given, with the suffix _sta. Click Open to view the data set.
For detailed information about the files and data sets used or created by JMP Life Sciences software, see Files and Data Sets.
Tab Viewer
This pane provides you with a space to view individual tabs within the Results window.
General
Click View Data to reveal the underlying data table associated with the current tab.
Click Reopen Dialog to reopen the completed process dialog used to generate this output.
Click Create Report to generate a pdf- or rtf-formatted report containing the plots and charts of selected tabs.
Click Close All to close all graphics windows and underlying data sets associated with the output.