Statistical Details for the Cluster Variables Platform

Variable Clustering Algorithm

The clustering algorithm iteratively splits clusters of variables and reassigns variables to clusters until no more splits are possible. The initial cluster consists of all variables. The algorithm was developed by SAS and is implemented in PROC VARCLUS (SAS Institute Inc. 2020g).

Note: The algorithm uses only observations for which there are no missing values for any variable in the Y, Columns list.

These are the iterative steps in the algorithm:

1. For all clusters, do the following:

a. Compute the principal components for the variables in each cluster.

b. If the second eigenvalues for all of the clusters are less than one, then terminate the algorithm.

2. Partition the cluster whose second eigenvalue is the largest (and greater than 1) into two new clusters using the following steps:

a. Rotate the principal components for the variables in the current cluster using an orthoblique rotation.

b. Define one cluster to consist of the variables in the current cluster whose squared correlations to the first rotated principal component are higher than their squared correlations to the second principal component.

c. Define the other cluster to consist of the remaining variables in the original cluster. These are the variables that are more highly correlated with the second principal component.

d. Compute the principal components of the two new clusters.

3. Test to see whether any variable in the data set should be assigned to a different cluster. For each variable, do the following:

a. Compute the variable’s squared correlation with the first principal component for each cluster.

b. Place the variable in the cluster for which its squared correlation is the largest.

Note: An orthoblique rotation is also known as a raw quartimax rotation. See Harris and Kaiser (1964).

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).