Parameters | Genetics | Optimized Automated Clustering Method

Optimized Automated Clustering Method
Use the drop-down menu to specify the method to use for hierarchical clustering via PROC CLUSTER.
Note: This parameter is available only when either Optimized or Automated has been selected as the Compression Method.
Clustering methods are described in the following table:
This method tends to join clusters with small variances and is biased toward producing clusters with the same variance.1
Choose this method to use nonparametric probability density estimates (for example, Hartigan, 19753 (pp. 205–212); Wong, 19824; Wong and Lane 19835). Density linkage consists of two steps:
A new dissimilarity measure, d*, based on density estimates and adjacencies is computed.
The CLUSTER procedure supports three types of density linkage: the kth-nearest-neighbor method, the uniform-kernel method, and Wong’s hybrid method.
The flexible- beta method was developed by Lance and Williams (1967)6.
The method was independently developed by Sokal and Michener (1958)7 and McQuitty (1966)8.
The median method was developed by Gower (1967)9.
Choose this method to set the distance between clusters to the ANOVA sum of squares between the two clusters summed over all the variables. At each generation, two clusters from the previous generation are merged to reduce the within-cluster sum of squares over all partitions. The sums of squares are easier to interpret when they are divided by the total sum of squares to give the proportions of variance (squared semipartial correlations).

Sokal, R.R., and C.D. Michener. (1958) A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin 38: 1409-1438.

Milligan, G.W. (1980) An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45: 325-342.

Hartigan, J. A. (1975) Clustering Algorithms, New York: John Wiley & Sons.

Wong, M. A. (1982). A Hybrid Clustering Method for Identifying High-Density Clusters. Journal of the American Statistical Association 77: 841–847.

Wong, M. A. and Lane, T. (1983), A kth Nearest Neighbor Clustering Procedure. Journal of the Royal Statistical Society.

Lance, G. N. and Williams, W. T. (1967). A General Theory of Classificatory Sorting Strategies. I. Hierarchical Systems. Computer Journal 9: 373–380.

Sokal, R. R. and Michener, C. D. (1958). A Statistical Method for Evaluating Systematic Relationships. University of Kansas Science Bulletin 38: 1409–1438.

McQuitty, L. L. (1966). Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data. Educational and Psychological Measurement 26: 825–831.

Gower, J. C. (1967). A Comparison of Some Methods of Cluster Analysis. Biometrics 23: 623–637.

Your choice of method might require additional options to be specified in the Additional PROC CLUSTER Options text field on the Options tab. The following is a brief list of methods that require or recommend additional parameter specifications: COMPLETE (TRIM= recommended); DENSITY (K=,R=, or HYBRID option must be specified); FLEXIBLE (See BETA= Option); TWOSTAGE (K=,R=, or HYBRID option must be specified); and WARD (TRIM= recommended) .
To Specify a Clustering Method:
Specify Optimized as the Compression Method.
You should refer to the SAS PROC CLUSTER documentation for details about all of these methods.