Normal Mixtures

Normal mixtures is an iterative technique, but rather than being a clustering method to group rows, it is more of an estimation method to characterize the cluster groups. Rather than classifying each row into a cluster, it estimates the probability that a row is in each cluster. See McLachlan and Krishnan (1997).

The normal mixtures approach to clustering predicts the proportion of responses expected within each cluster. The assumption is that the joint probability distribution of the measurement columns can be approximated using a mixture of multivariate normal distributions, which represent different clusters. The distributions have mean vectors and covariance matrices for each cluster.

Hierarchical and k-means clustering methods work well when clusters are well separated, but when clusters overlap, assigning each point to one cluster is problematic. In the overlap areas, there are points from several clusters sharing the same space. It is especially important to use normal mixtures rather than k-means clustering if you want an accurate estimate of the total population in each group, because it is based on membership probabilities, rather than arbitrary cluster assignments based on borders.

To perform Normal Mixtures, select that option on the Method menu of the Iterative Clustering Control Panel (Iterative Clustering Control Panel). After selecting Normal Mixtures, the control panel looks like Normal Mixtures Control Panel.

Normal Mixtures Control Panel

Some of the options on the panel are described in K-Means Control Panel. The other options are described below:

Diagonal Variance

is used to constrain the off-diagonal elements of the covariance matrix to zero. In this case, the platform fits multivariate normal distributions that have no correlations between the variables.

This is sometimes necessary in order to avoid getting a singular covariance matrix, when there are fewer observations than columns.

Outlier Cluster

is used to fit a Uniform cluster to catch any outliers that do not fall into any of the Normal clusters. If this cluster is created, it is designated cluster 0.

Tours

is the number of independent restarts of estimation process, each with different starting values. This helps to guard against finding local solutions.

Maximum Iterations

is the maximum number of iterations of the convergence stage of the EM algorithm.

Converge Criteria

is the difference in the likelihood at which the EM iterations stop.

For an example of Normal Mixtures, open the Iris.jmp sample data table. This data set was first introduced by Fisher (1936), and includes four different measurements: sepal length, sepal width, petal length, and petal width, performed on samples of 50 each for three species of iris.

Note: Your results may not exactly match these results due to the random selection of initial centers.

On the Cluster launch dialog, assign all four variables to the Y, Columns role, select KMeans from Method menu, and click OK. Select Normal Mixtures from the Method menu, specify 3 for the Number of Clusters, and click Go. The report is shown in Normal Mixtures Report.

Normal Mixtures Report

The report gives summary statistics for each cluster:

•	count of number of observations and proportions

•	means for each variable

•	standard deviations for each variable.

•	correlations between variables

The Cluster Comparison report gives fit statistics to compare different numbers of clusters. For KMeans Clustering and Self Organizing Maps, the fit statistic is CCC (Cubic Clustering Criterion). For Normal Mixtures, the fit statistic is BIC or AICc. Robust Normal Mixtures does not provide a fit statistic.

Robust Normal Mixtures

The Robust Normal Mixtures option is available if you suspect you may have outliers in the multivariate sense. Since regular Normal Mixtures is sensitive to outliers, the Robust Normal Mixtures option uses a more robust method for estimating the parameters. For details, see Statistical Details for Robust Estimation Methods.

To perform Robust Normal Mixtures, select that option on the Method menu of the Iterative Clustering Control Panel (Iterative Clustering Control Panel). After selecting Robust Normal Mixtures, the control panel looks like Robust Normal Mixtures Control Panel.

Robust Normal Mixtures Control Panel