Multivariate Methods > Hierarchical Cluster > Hierarchical Cluster Options
Publication date: 08/13/2020

# Hierarchical Cluster Options

The Hierarchical Clustering red triangle menu contains the following options:

Color Clusters

Colors the labels for dendrogram and their associated join bars according to cluster membership. Also assigns the corresponding colors to the rows of the data table The colors update if you change the number of clusters. If you deselect this option, the colors are no longer updated based on the number of clusters.

Mark Clusters

Assigns markers to the rows of the data table corresponding to the cluster to which the row belongs. The markers update if you change the number of clusters. If you deselect this option, the markers are no longer updated based on the number of clusters.

Number of Clusters

Prompts you to enter a number of clusters and positions the dendrogram slider to that number.

Cluster Criterion

Gives the Cubic Clustering Criterion (CCC) for the entire range of number of clusters. The CCC is used to estimate the number of clusters. It can be used with any distance-based clustering algorithm. Larger values of the CCC indicate better fit in terms of number of clusters. See SAS Institute Inc. (1983). (Not available when Data is distance matrix is selected.)

Show Dendrogram

Shows or hides the Dendrogram report.

Dendrogram Scale

Contains the following options for scaling the dendrogram:

Distance Scale

Shows the horizontal distances between any two join points as the distances between the two clusters joined at that point, based on the distance method specified on the launch window. The distance scale is the same scale as used in the Distance Graph and is the default scale for the dendrogram.

Even Spacing

Shows the horizontal distances between any two join points as equal.

Geometric Spacing

Increases the horizontal distances between join points as the number of clusters increases. This option is useful when there are many objects and you want the smaller clusters to be more visible than the larger clusters.

Distance Graph

Shows or hides the distance plot beneath the dendrogram.

Show NCluster Handle

Shows or hides the handles on the dendrogram used to manually change the number of clusters.

Zoom to Selected Rows

Selects and enlarges a particular cluster after you select the cluster in the dendrogram. Alternatively, you can double-click the cluster to zoom in on it. Use Release Zoom to return to the original view.

Release Zoom

Returns the dendrogram to the original view after zooming.

Pivot on Selected Cluster

Reverses the order of the two sub-clusters of the currently selected cluster.

Color Map

Gives the option to add a color map, or heat map, showing each Y, Column variable colored by value. Several color theme choices are available in a submenu.

Two Way Clustering

Clusters by the variables specified in Y, Columns as well as rows. A color map is added with a dendrogram for the Y, Column variables at its base. Typically, for two-way clustering, your variables are measured on the same scale and you do not select Standardize Data. (Not available when Data is stacked is selected.)

Positioning

Provides options for changing the positions of labels and other parts of the dendrogram.

Legend

Shows or hides a legend for the colors used in color maps. This option is available only if a color map is enabled.

More Color Map Columns

Adds a color map for specified columns. (Not available when Data as summarized, Data is distance matrix, or Data is stacked is selected.)

Constellation Plot

Shows or hides an alternative way to present the information in the hierarchical clustering dendrogram. Each observation (row) is represented by an endpoint and each cluster join is represented by a new point. The lines that are drawn represent cluster membership. The lengths of the lines represent the distance between clusters. Longer lines represent greater distances between clusters.

You can position your pointer over the lines in the constellation plot to see their length. However, the length values are meaningful only with respect to each other. The axis scaling, orientation of points, and angles of the lines are arbitrary. They are determined such that the ends of the nodes are spaced out and the plot does not appear cluttered, which is important with larger data sets.

To turn off the labels on the endpoints, right-click inside the Constellation Plot and deselect Show Labels.

Save Constellation Coordinates

Saves the coordinates of the constellation plot to the data table. (Not available when Data as summarized, Data is distance matrix, or Data is stacked is selected.)

Save Clusters

Creates a data table column that contains the cluster number. If Add Spatial Measures is selected on the launch window, the cluster numbers are also saved to the Hough Data Table.

Save Formula for Closest Cluster

Creates a data table column that contains a formula for the closest cluster. This option calculates the squared Euclidean distance to each cluster’s centroid and selects the cluster that is closest. Note that this formula does not always reproduce the cluster assignment given by Hierarchical Clustering since the clusters are determined differently. However, the cluster assignment is very similar. (Not available when Data as summarized, Data is distance matrix, or Data is stacked is selected.)

Save Display Order

Creates a data table column that contains the order in which the row appears in the dendrogram.

Save Cluster Hierarchy

Creates a data table that contains the information needed to write a script for a custom dendrogram. For each cluster join, there are three rows: the first for the joiner, the second for the leader, and the third for the result, giving the cluster centers, size, and other information.

Save Cluster Tree

Creates a new data table that contains information needed to compare cluster trees between JMP and SAS. For each cluster join, there is one row for each new cluster, with the cluster’s size and other information.

Save Distance Matrix

Creates a new data table that contains the distances between the observations.

Save Cluster Means

Creates a new data table that contains the number of rows and the means of each column in each cluster.

Cluster Summary

(Not available when Data is distance matrix is selected.) Displays the following information:

Cluster Means

A table that gives, for each cluster, the number of observations (or Object IDs, if the data are stacked) and means for each variable.

Cluster Standard Deviations

A table that gives, for each cluster, the number of observations (or Object IDs, if the data are stacked) and standard deviations for each variable.

Cluster Means Plot

Either a parallel plot or a two-dimensional heat map of the cluster means.

The plot is a parallel plot unless Data is stacked is selected and there are two Attribute ID variables. For the parallel plot, the axis for each variable is scaled as follows:

If Standardize Data were selected, the axis ranges from two standard deviations above and below the mean, where the standard deviation and mean are computed for the raw data. If a cluster mean falls beyond this range, the axis is extended to include it.

If Standardize Data were not selected, there is a common vertical axis whose scaling is displayed. (The scaling is equivalent to the Scale Uniformly option in Graph Builder).

When Data is stacked is selected and there are two Attribute ID variables, two-dimensional plots of the mean of the Y variable at each location are shown for each cluster. These plots are colored using a Blue to Gray to Red color gradient.

Column Summary

For each variable, gives the RSquare value that represents the proportion of variation explained by the clusters. This number is the RSquare value for a regression of the variable on the clusters. The option also gives a bar graph of RSquare values.

Scatterplot Matrix

Creates a scatterplot matrix using all the variables. (Not available when Data as summarized, Data is distance matrix, or Data is stacked is selected.)

Parallel Coord Plots

Creates a parallel coordinate plot for each cluster. (Not available when Data as summarized, Data is distance matrix, or Data is stacked is selected.) The axes are scaled as described for the Cluster Means Plot. See Cluster Means Plot.

Cluster Treatment Comparisons

(Available only if you hold Shift and click the Hierarchical Clustering red triangle.) Select a response column and a two-level treatment column. Creates a Hierarchically Clustered Differences report.

Local Data Filter

Shows or hides the local data filter that enables you to filter the data used in a specific report.

Redo

Contains options that enable you to repeat or relaunch the analysis. In platforms that support the feature, the Automatic Recalc option immediately reflects the changes that you make to the data table in the corresponding report window.

Save Script

Contains options that enable you to save a script that reproduces the report to several destinations.

Save By-Group Script

Contains options that enable you to save a script that reproduces the platform report for all levels of a By variable to several destinations. Available only when a By variable is specified in the launch window.