Data Structure

For the latest version of JMP Help, visit JMP.com/help.

Multivariate Methods > Hierarchical Cluster > Launch the Hierarchical Cluster Platform > Data Structure

Publication date: 09/28/2021

Data Structure

These options describe the form of the data that is used in calculating multivariate distances:

Data as usual

Data that are rectangular with one row for each observation and one column for each variable.

Data as summarized

Data that are summarized by the levels of one or more identifying columns. When you select this option, an Object ID text box appears in the launch window. Specify the identifying columns as the Object ID. The Data as summarized option calculates level means and treats these means as your input data.

Data is distance matrix

Data that consist of distances between observations. For n observations, the distance table should have n rows and n + 1 columns. One column (usually the first) must contain a unique identifier for each of the n observations. The remaining columns contain distances between that observation and the n observations. Note the following:

– The diagonal elements of the table should be zero or missing, because the distance between a point and itself is zero. Values that are not zero or missing are treated as zero, and a note appears in the report.

– The distance columns can be a symmetric square matrix, or they can be upper or lower triangular with missing entries in the lower or upper portion. If the distances are given as a square matrix, a warning appears in the report if the table is not symmetric.

– You can begin with a different data structure and then save a distance matrix. See Save Distance Matrix.

When you select the Data is distance matrix option, enter the distance columns as Y, Columns and the identifier column as Label. The Label column must have the Character data type. For an example, see Example of a Distance Matrix.

Data is stacked

Data that have a single response of interest and multiple rows for each object.

When you select the Data is stacked option, Attribute ID and Object ID text boxes appear in the launch window.

– Enter a single column as Y, Columns.

– Enter columns that describe groupings of the Y, Columns variable as Attribute ID. If only two columns are entered and if you select Add Spatial Measures, then you can add spatial components to be used in the cluster analysis. See Add Spatial Measures.

– Enter the identifying columns for objects as Object ID.

The analysis that is conducted is equivalent to splitting the Y, Column variable by the Attribute ID columns and then performing hierarchical clustering without standardizing the response columns.

Tip: Use this option together with the Add Spatial Measures option to perform two-dimensional spatial clustering. For example, wafer data are often recorded using one row for each die. Interest centers around clustering wafers. See Example of Wafer Defect Classification Using Spatial Measures.

Caution: Because there is a single measurement column, the Standardize Data option is not appropriate for stacked data.

Not Enough Nonmissing Data Alert

The JMP alert Not enough nonmissing data can be difficult to understand when you are using the Data as summarized or Data is stacked data structures. The alert occurs in the following situations:

• For Data as usual, when all rows or all but one row are missing at least one value for a Y, Columns variable.

• For Data as summarized, when your data are summarized across the Object ID columns, all rows or all but one row are missing at least one value of the summarized Y, Column variables. To see the data structure that the Cluster platform is analyzing, select Tables > Summary, enter the Object ID columns as Group and the Y, Columns variables as Statistics > Mean.

• For Data is stacked, when your data are split across the Attribute ID columns, all rows or all but one row are missing at least one value of the split Y, Column values. To see the data structure that the Cluster platform is analyzing, select Tables > Split, enter the Attribute ID columns as Split By, the Y, Columns variable as Split Columns, and the Object ID columns as Group.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).