Hierarchical Clustering Method

Use the radio buttons to select the clustering method used for joining the clusters. The Fast Ward method is selected by default.

Available options are described in the table below:

Clustering Method

Description

Average

Choose this method to set the distance between clusters to the average distance between pairs of observations.

This method tends to join clusters with small variances and is biased toward producing clusters with the same variance.1

Centroid

Choose this method to set the distance between clusters to the squared Euclidean Distance between the means of each cluster.2

This method is more robust than other clustering methods.

Complete

Choose this method to set the distance between clusters to the maximum distance between an observation in one cluster and an observation in the other.2

This method is biased toward producing clusters of equivalent diameters and can be distorted by even moderate outliers.

Fast Ward

Choose this method to apply Ward’s minimum variable method more quickly for large numbers of rows.2

This method is chosen automatically for any data set containing more than 2000 rows.

Single

Choose this method to set the distance between two clusters to the minimum distance between an observation in one cluster and an observation in the other cluster.

Because there are no constraints on the shape of clusters, single linkage sacrifices performance in the recovery of compact clusters in return for the ability to detect elongated and irregular clusters. Single linkage tends to chop off the tails of distributions before separating the main clusters.

Ward

Choose this method to set the distance between clusters to the ANOVA sum of squares between the two clusters summed over all the variables. At each generation, two clusters from the previous generation are merged to reduce the within-cluster sum of squares over all partitions. The sums of squares are easier to interpret when they are divided by the total sum of squares to give the proportions of variance (squared semipartial correlations).

This method joins clusters to maximize the likelihood at each level of the hierarchy under the assumptions of multivariate normal mixtures, spherical covariance matrices, and equal sampling probabilities.

This method tends to join clusters with a small number of observations and is biased toward producing clusters with approximately the same number of observations. It is also very sensitive to outliers.2