Self Organizing Maps

The Self-Organizing Maps (SOMs) technique was developed by Teuvo Kohonen (1989) and further extended by a number of other neural network enthusiasts and statisticians. The original SOM was cast as a learning process, like the original neural net algorithms, but the version implemented here is done in a much more straightforward way as a simple variation on k-means clustering. In the SOM literature, this would be called a batch algorithm using a locally weighted linear smoother.

The goal of a SOM is not only to form clusters, but form them in a particular layout on a cluster grid, such that points in clusters that are near each other in the SOM grid are also near each other in multivariate space. In classical k-means clustering, the structure of the clusters is arbitrary, but in SOMs the clusters have the grid structure. This grid structure helps interpret the clusters in two dimensions: clusters that are close are more similar than distant clusters.

To create a Self Organizing Map, select that option on the Method menu of the Iterative Clustering Control Panel (Iterative Clustering Control Panel). After selecting Self Organizing Map, the control panel looks like Self Organizing Map Control Panel.

Self Organizing Map Control Panel

Some of the options on the panel are described in K-Means Control Panel. The other options are described below:

N Rows

is the number of rows in the cluster grid.

N Columns

is the number of columns in the cluster grid.

Bandwidth

determines the effect of neighboring clusters for predicting centroids. A higher bandwidth results in a more detailed fitting of the data.

Self Organizing Map Report

The report gives summary statistics for each cluster:

•	count of number of observations

•	means for each variable

•	standard deviations for each variable.

The Cluster Comparison report gives fit statistics to compare different numbers of clusters. For KMeans Clustering and Self Organizing Maps, the fit statistic is CCC (Cubic Clustering Criterion). For Normal Mixtures, the fit statistic is BIC or AICc. Robust Normal Mixtures does not provide a fit statistic.

For details about the red-triangle options for Self Organizing Maps, see K-Means Platform Options.

Implementation Technical Details

The SOM implementation in JMP proceeds as follows:

•	The first step is to obtain good initial cluster seeds that provide a good coverage of the multidimensional space. JMP uses principal components to determine the two directions that capture the most variation in the data.

•	JMP then lays out a grid in this principal component space with its edges 2.5 standard deviations from the middle in each direction. The clusters seeds are formed by translating this grid back into the original space of the variables.

•	The cluster assignment proceeds as with k-means, with each point assigned to the cluster closest to it.

•

The means are estimated for each cluster as in k-means. JMP then uses these means to set up a weighted regression with each variable as the response in the regression, and the SOM grid coordinates as the regressors. The weighting function uses a ‘kernel’ function that gives large weight to the cluster whose center is being estimated, with smaller weights given to clusters farther away from the cluster in the SOM grid. The new cluster means are the predicted values from this regression.

•	These iterations proceed until the process has converged.