Process Description

Imputed SNP (Wide Format) Input Engine

The Imputed SNP (Wide Format) Input Engine imports a set of files created by a SNP imputation program, such as MACH (Li and Abecasis 2006). This process outputs two different SAS genotype data sets that can be used for subsequent analyses.

The output genotype threshold data set is a wide data set listing the most likely genotype for each marker. Alternatively, for markers for which no genotype's probability meets the specified threshold, a missing value symbol (.) is listed.
The output genotype probabilities data set lists genotype probabilities in a stacked format.

Note: Unlike the Imputed SNP (Tall Format) Input Engine, the Imputed SNP (Wide Format) Input Engine does not generate an Annotation Data Set.

Consult the Imputed SNP Import Tutorial (Genomics > Import > Other Genetics > Imputed SNP Import Tutorial) for help on what options to use for your particular files. You should also refer to Data Sets Used in JMP Genomics Processes for information about data set formats.

What do I need?

The genotype probability file(s) must be in wide format, where sets of genotype probability columns correspond to SNPs, and individuals are in rows. With the options provided, files from programs such as MACH, can be imported and analyzed. MACH-generated probability files have the .mlprob extension.

Optional files include both pedigree and data files, also in a format used by MACH software. They can be imported and combined with the genotype data.

A pedigree file is a tab- or space-delimited file with the first five columns representing the family identifier, individual identifier, father and mother IDs, and gender, respectively. The remaining columns are named in the specified Data file. You must specify a Data file when a Pedigree File is specified.
The data file must correspond to the pedigree file. This file has rows containing names and types for the sixth column, and on, of the Pedigree file. Furthermore, each row of the data file contains a code indicating the type of variable (such as M for marker genotype, C for covariate, or T for trait) followed by a space and the name of the variable.

The following example uses the sample.mlprob genotype probability file, the sample_nogeno.ped pedigree file, and the sample_nogeno.dat data file included in the Sample Data folder.

Note: Although the pedigree file used in this example does not contain columns of marker genotypes, most pedigree files typically do. Similarly, although the data file used in this example does not contain rows identifying the marker genotypes listed in the pedigree file, most data files typically do.

For detailed information about the files and data sets used or created by JMP Genomics software, see Files and Data Sets.

Output/Results

The output data sets generated by this process are listed in a Results window. Refer to the Imputed SNP (Wide Format) Input Engine output documentation for detailed descriptions.