Processes | Pattern Discovery | K-Means Clustering Method

Use the radio buttons to select the method used for joining the clusters. The Automated K Means method is selected by default.
 This method calls SAS/STAT PROC FASTCLUS. Refer to the for additional details. Choose this method to use the Correlation Radius f or Clustering parameter to determine the number of clusters. This method calls SAS/STAT PROC FASTCLUS. Refer to the for additional details. This method performs an iterative alternating fitting process to form the number of specified clusters. The method first selects a set of n points called cluster seeds as a first guess of the of the clusters. Each is assigned to the nearest seed to form a set of temporary clusters. The seeds are then replaced by the cluster means, the points are reassigned, and the process continues until no further changes occur in the clusters. The k -means approach is a special case of a general approach called the EM algorithm, where E stands for Expectation (the cluster means in this case) and the M stands for maximization, which means assigning points to closest clusters in this case. is an iterative technique, but rather than being a method to group rows, it is more of an estimation method to characterize the cluster groups. Rather than classifying each row into a cluster, it estimates the probability that a row is in each cluster. 1 The normal mixtures approach to clustering predicts the proportion of responses expected within each cluster. The assumption is that the joint probability of the measurement columns can be approximated using a mixture of multivariate normal distributions, which represent different clusters. The distributions have mean vectors and matrices for each cluster. Hierarchical and k -means clustering methods work well when clusters are well separated, but when clusters overlap, assigning each point to one cluster is problematic. In the overlap areas, there are points from several clusters sharing the same space. It is especially important to use normal mixtures rather than k -means clustering if you want an accurate estimate of the total in each group, because it is based on membership probabilities, rather than arbitrary cluster assignments based on borders.

McLachlan, G.J., and T. Krishnan. (1997) The EM Algorithm and Extensions. John Wiley and Sons. New York, NY.