Launch the Distance Matrix Platform

Launch the Distance matrix platform by selecting Analyze > Life Sciences . Distance Matrix.

Figure 9.2 The Distance Matrix Launch Window

The Distance Matrix Launch Window

For information about the options in the Select Columns red triangle, see “Column Filter Menu”in Using JMP.

Y, Columns

Select the marker columns that you want and click Marker to specify the markers that you want to analyze.

X, Grouping Columns

Analyzes the rows assigned to each level of the specified column separately. All results are presented in a single table and report.

Produces a separate report for each level of the By variable. If more than one By variable is assigned, a separate report is produced for each possible combination of the levels of the By variables.

Method

Select the algorithm to use for performing the analysis.

• Select Euclidean to determine the straight-line distance between two points in Euclidean space. This method requires continuous data and it’s best for geometrically-structure normalized data. This method is not ideal for use on sparce data or when variables are on different scales.

• Select Manhattan to compute the sum of the absolute differences across dimensions. This method is good for both continuous and ordinal data, especially when that data contains a grid-like structure. It is best used for high-dimensional data and deals more robustly with outliers than Euclidean methods.

• Select Gower when you have a mix of different types of data (continuous, ordinal, categorical, and/or binary). This method is best for sociological, ecological, or survey data. It is not ideal for large-scale pure numeric data.

• Select Bray-Curtis to when you are measuring proportional differences rather than absolute differences. This method ignores shared zeros. Data should be non-negative and continuous, consisting of counts or abundances. It is best used for ecological data, microbiome data, species abundance data, etc. It is not ideal for situations where the total magnitude is particularly meaningful.

• Select Jaccard to measure dissimilarity for sets or binary vectors. This method requires binary data where there are only 2 possible answers. It is best used for genetic data, text mining or set comparisons. It should not be used when counts or magnitude matter.

• Select Binary to compute distance based on binary matching. This often implies a Simple Matching Coefficient. The data must be binary. This method is best for symmetrical binary attributes, where either result is equally important, but not for asymmetrical data, where one result is more significant than the other.

• Select Hamming to analyze a number of differing positions between two equal-length strings. Data can be categorical, binary, or composed of sequences. It is used for analyzing DNA or protein sequences, binary strings and error detection. It is best used for categorical sequence comparisons and worst for continuous numeric data.

See SAS PROC Distance documentation for details on each of these methods.

Missing Value Imputation

Check this option when there are missing values to be imputed.

Create New Tables for Results

Check this option to create new tables for the output data and results. When this option is unchecked output is added to the end of the input data table.

Number of Principal Components

Use this option to specify how many principal components to calculate. All of the specified principal components are added to the output table. Note that only the first two PCs are plotted.

Number of Permutations

Use this option to specify the number of permutations to run.

Keep dialog open

Check this box to keep this platform dialg open after the analysis is run.

Data Format

Most of the processes in JMP assume that the input table has a particular data structure. JMP distinguishes between tall and wide data sets. A tall data table has samples as columns and molecular entity (for example, marker, gene, clone, protein, or metabolite) as rows, whereas a wide data table is the transpose of the tall data table, having the samples as rows and molecular entity as columns.

When specifying the input data set for a process, it is important to know the required form. Distance Matrix requires a wide data table. The Transpose platform under the Tables menu enables you to transform your data tables between tall and wide forms.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).