Secondary Annotation SAS Data Set
Use this field to specify the name and complete path to the secondary annotation SAS data set.
Annotation Data Sets
An Annotation Data Set contains biological or chemical information and properties about genes, SNPs, probes, probesets, or peptides. This annotation information comes from various online Bioinformatics resources, including government agencies, academic organizations and commercial entities. It is used to create a custom Annotation Data Set for your analysis.
The structure of an annotation data set and the information that it provides can vary depending on the nature of the experiment, the source of the data and the application that generated it. The table below lists information commonly contained in an Annotation Data Set. Keep in mind that different providers might name annotation information differently.
An accession number is a unique identifier given to a biological polymer sequence (such as DNA or a protein) when it is submitted to a sequence database (GenBank, EMBL, DDBJ).
A unique identifier assigned to a gene record in Entrez Gene. It is an integer and is species specific. For genomes that had been represented in LocusLink, the Gene ID is the same as the Locus ID.
A unique identifier assigned to a single nucleotide polymorphism (SNP) when it is submitted to the SNP database. Also known as an 'rs' ID.
The structure of the Annotation Data Set for genetics processes differs from that of the microarray and proteomics processes.
For genetics, each row in the Annotation Data Set represents a marker or SNP used in the analysis, with variables typically containing the following information: a name or identifier for each marker, the chromosome or candidate gene on which it is located, its location (in terms of kilobases or centiMorgans, for example), and an accession number that can be used to retrieve more information about the locus from a publicly available online database. This data set can be specified on the Annotation tab found on most of the process dialogs where the columns can be assigned to various roles:
Annotation Label Variable - the name or ID variable that is used to label markers in the output
Annotation Group Variable - the variable, such as chromosome, that can be used to group the analyses and output
Annotation Location Variable - the variable containing marker locations to be used to accurately represent distances between markers in p-value plots
Accession Number Variable - the variable containing GenBank accession number or dbSNP reference sequence ID for example, to be used to create buttons on p-value plots that provide direct access to the website for the selected marker from the appropriate online database
This tab also allows conditional inclusion of markers in your analysis based on particular values of variables from the Annotation Data Set. The criteria can be entered in the Filter to Include Variables field in accordance with SAS syntax for WHERE statements.
For the microarray and proteomics processes, the Annotation Data Set must contain a merge key variable whose values exactly match those of some variable in a tall data set.
For detailed information about the files and data sets used or created by JMP Life Sciences software, see Files and Data Sets.
To Specify an Annotation Data Set:
The method used for this specification can vary depending on whether JMP is connected to SAS on your local machine or connected to SAS on a server. You should refer to the Specifying Folders, Files, and Data Sets documentation for detailed information.
To View the Contents of the Specified Data Set:
Click Open.