Introduction to Life Sciences

Overview of Genetic Analysis

Life Sciences provides methods in JMP to help you analyze your genetic data and use that data to simulate a breeding program to predict the optimum genetic crosses to make.

• The Marker Statistics platform provides a way to explore various properties of all biallelic markers in a data set, focused on quality control (QC) and potentially identifying markers to exclude from the analysis. See “Marker Statistics”.

• The Marker Simulation platform simulates the progeny from a specified set of crosses using biallelic markers and predictor formulas that are generated by using the Response Screening platform (see “Response Screening” in Predictive and Specialized Modeling) that is saved in your data table. This process enables you to test various crosses to estimate which crosses will generate progeny with the wanted trait combinations. See “Marker Simulation”.

• The Marker Relatedness report assesses different measures of genetic relatedness between pairs of individuals based on their genetic markers. The output of the platform is a genomic relationship matrix with n rows by n columns (n = number of individuals in the data table). You can use the matrix to fit GBLUP models under the Fit Model platform with the Response Screening personality. You can also use the relationship matrix to assess groups of related individuals via principal component analysis and clustering. See “Marker Relatedness”.

• The Marker Admixture platform estimates probabilities of ancestral origins (admixture probabilities) for individuals based on a series of marker genotypes.

• The Marker Imputation platform imputes numeric missing marker genotypes for diploid ( or polyploid organisms using the linkage disequilibrium k-nearest neighbor imputation or other methods. Distances between markers with missing genotypes and all other markers are computed and for each marker with a missing genotype, a set of closest markers is selected. Next, the respective selected set of markers is used to compute distances between samples with missing genotypes and all other samples. Finally, the set of closest samples and closest markers are used to impute the missing genotypes.

• The Normalization platform normalizes a table of raw data, using a variety of methods, and either saves the data to a new JMP table or replaces the raw data in the existing table.

• The Distance Matrix platform computes various measures of distance or dissimilarity between the observations (rows) of a data set. Methods include: Euclidean (default), Jaccard, Bray-Curtis, Gower, and Hamming. These proximity measures are stored as a square matrix in an output data set. The output distance matrix can be used for hierarchical clustering and heatmap visualizations, depending on user needs.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).