Multivariate Methods > Cluster Variables > Additional Examples of the Cluster Variables Platform > Example of Cluster Variables Platform for Dimension Reduction
Publication date: 11/10/2021

Example of Cluster Variables Platform for Dimension Reduction

In this example, you use the Cluster Variables platform as a dimension-reduction tool for modeling. The Penta.jmp sample data table contains 15 variables used to predict the response variable, log RAI. Use Cluster Variables to reduce this number.

Cluster Variables

1. Select Help > Sample Data Library and open Penta.jmp.

2. Select Analyze > Clustering > Cluster Variables.

3. Select all of the continuous variables, except logRAI and click Y, Columns.

4. Click OK.

5. Click the Variable Clustering red triangle and select Save Cluster Components.

Five grouped formula columns are added to the data table.

Figure 16.5 Cluster Variables Report for Penta.jmp

The Cluster Summary and Cluster Members reports show that the variables are clustered into five groups, so there are five Cluster Component variables.

Fit Models

Next, fit and compare two models to predict logRAI:

A model using all continuous variables as predictors.

A model using the Cluster Components as predictors.

1. Click the Variable Clustering red triangle and select Launch Fit Model.

2. Select logRAI and click Y.

Notice that the Most Representative Variables the five clusters have been entered in the Construct Model Effects list. However, you want to enter all predictors.

3. Select all of the continuous variables from S1 to P5 and click Add.

Be careful not to include Obs Name.

4. Select the box next to Keep dialog open.

5. Click Run.

Figure 16.6 Fit Least Squares Report for Model with All Continuous Predictors

6. In the Fit Model window, select all variables in the Construct Model Effects window and click Remove.

7. Select the Cluster Components group and click Add.

8. Click Run.

Figure 16.7 Fit Least Squares Report for Model with Cluster Components as Predictors

The model that includes the five Cluster Components as the only predictors explains a substantial amount of the variation in the response, with an adjusted RSquare of 0.784. The model that uses all fifteen predictors has only a slightly higher adjusted RSquare of 0.853 (Figure 16.6).