Affymetrix SNP CEL Input Engine

The Affymetrix SNP CEL Input Engine imports a set of .cel files generated following hybridizations of genomic DNA to SNP Genotyping GeneChips and combines the data into a single SAS data set.

What do I need?

Before you can successfully import the raw data into SAS data sets that can be used for analysis in JMP Genomics, you must locate and gather several sources of information.

•

An Experimental Design File (EDF) that indexes the individual .cel files for the experiment. The EDF is typically a text file or Excel spread sheet and must be created before the data can be imported. Alternatively, you can generate an EDF by parsing the .arr files, associated with each .cel file, using the Affymetrix ARR File Parser. The ARR File Parser is a JMP Scripting Language (JSL) script that parses information from all of the .arr files into two JMP files. The first output file (named ARR.jmp, by default) contains all the experimental parameters listed in all of the .arr files. The second output file (named EDF.jmp, by default) lists only those variables that vary across chips. This EDF.jmp file can be modified to serve as an EDF for subsequent JMP Genomics processes.

•

All of the .cel files containing the raw data. They must be located and copied to a single folder. Each .cel file corresponds to an individual microarray and contains the hybridization intensities for that array.

•

One or more specific library files. These files contain information used to associate individual data points extracted from the .cel files with corresponding probesets. Library files are available for download from the Affymetrix website.

•

Two SAS data sets, referred to as the SNP Annotation data set and the Copy Number Annotation data set. These data sets contain annotation information imported from the Affymetrix .CSV annotation file for the specific SNP GeneChips used in the experiments. Each row in the SNP Annotation data set corresponds to a separate SNP probeset. Each row in the Copy Number Annotation data set corresponds to a separate copy number probeset. Required variables for both data sets include Probe_Set_ID, Chromosome, and Physical_Position. To generate each of these data sets, first download the specific .CSV file from the Affymetrix website and then import the data into a SAS data set using the Import Individual Text, CSV, or Excel Files process.

A subset of the Chromosome X Titration data set available on DVD from Affymetrix serves as an example. The copy number variation data set consists of five replicates of each of five samples. Three of the samples have abnormal copies of the X chromosome, possessing three, four and five copies, respectively. The remaining two are a normal male and a normal female. A subset of ten files were selected for use in this example and saved in the chrX_titration folder created in the JMP Genomics Sample Data folder.

The EDF was generated using information contained in the .arr files associated with each of the raw data files. Files were parsed using the Affymetrix ARR File Parser process. The required ColumnName variable was added to the resulting EDF.jmp file using the Create ColumnName process and the exp_x_10 modified EDF was saved as a SAS data set in the chrX_titration folder.

The GenomeWideSNP_6.Full.cdf file was downloaded from the Affymetrix website and saved in the chrX_titration folder.

The genomewidesnp_6_na22_annot.sas7bdat SNP annotation file and the genomewidesnp_6_cn_na23_annot.sas7bdat copy number annotation file were generated by importing .csv files, downloaded from NetAffx, using the Import Individual Text, CSV, or Excel Files process, and saved in the chrX_titration folder.

For detailed information about the files and data sets used or created by JMP Life Sciences software, see Files and Data Sets.

Output/Results

The output data sets generated by this process are listed in a Results window. Refer to the Affymetrix SNP CEL Input Engine output documentation for detailed descriptions.