Multivariate Control Chart

Style

section-padding-none

What is a multivariate control chart?

A multivariate control chart is used to control multiple processes on one control chart. It takes advantage of the correlation among the multiple processes.

Why use multivariate control charts?

Shewhart charts are good for monitoring one process at a time. Often, more than one characteristic is measured per subgroup. For example, you might want to measure percent impurities for multiple types of impurities, chemical compositions in a batch of chemicals, physical characteristics of a sample, or air pollutants at a specific point in time. These characteristics are often correlated.

Univariate Shewhart charts can be used to control each component’s variability. Multivariate control charts can be used to control the correlation, to make sure that the relationship between the variables is stable over time.

Figure 1: Illustration of correlated variables with one unusual observation (far right). The individual observations appear in control on the two univariate control charts on the left.

The graph on the far right of Figure 1 shows 100 observations from two highly correlated variables. The correlation coefficient is about 0.8. When the process is stable, the observations stay within the 99% density ellipse. The process can go out of control with a shift in the mean or variance of either variable or with a shift in the correlation between the variables. The two univariate charts on the left pick up a shift in the mean or variance of the variables, but only the multivariate chart signals a shift in the correlation structure. The red observation is in control on the univariate control charts but is an unusual observation that doesn’t follow the correlation. For example, tall women tend to have long feet, short women tend to have small feet. In a system measuring women’s heights and foot lengths, a tall woman with small feet is an example of a point that would fall outside of the correlation structure.

What is Hotelling’s T² chart?

One measure of how scattered the data are around their mean, or centroid, is given by Hotelling’s T² statistic. Hotelling’s T² is analogous to the Student's t statistic in the univariate case. Hotelling’s T² measures the distance of the data points to the centroid of the data cloud in n dimensional space, taking the correlation into account. When the distance between an observation and the centroid is larger than expected, the chart will signal.

For row i, the statistic is calculated as $T_i^2 = (\mathbf{y}_i - \mu)^{\prime} \Sigma^{-1} (\mathbf{y}_i - \mu)$, where $\Sigma$ is the covariance matrix of the Y variables, $\mu$ is the true mean vector, and $\mathbf{y}_i$ is the vector of observations in the ith row. The T² statistics can be plotted on a control chart with appropriate limits. (For details on limit calculation, see section 4.2 of Statistical Process Control Course - JMP User Community.)

Example of a T² control chart

Suppose you work for a publishing company that prints paperback books. You are interested in controlling the process that prints the covers for the books. Each day, you print a test page on the same paper stock used to cover your books. The test page contains regions that are printed in seven colors (red, orange, yellow, green, blue, indigo, and violet). It also contains three shapes: a rectangle whose height is four times its width, a square, and a rectangle whose width is four times its height.

Every day, you use a colorimeter to measure the wavelength of reflected light from the seven colored rectangles and calipers to measure the height and width of the three rectangles, for a total of 13 variables measured. We will use simulated data from this process to illustrate the T² and other multivariate control charts.

Figure 2: T² control chart on 13 printing variables measured daily. No points are beyond the upper control limit. The T² chart will signal if the mean vector of the 13 variables has changed.

left

blue

What is a model-driven multivariate control chart?

A model-driven multivariate control chart (MDMVCC) enhances the T² chart with automatic model selection based on principal components analysis (PCA) of the data, a control chart on the normalized distance from each data value to the PCA model (DModX), and a contribution proportion plot.

A T² chart controls all the data. A T² chart on the important principal components controls just the important directions in the data and not the noise components. The MDMVCC fits a PCA model to the data, then retains the number of components that explain at least 85% of the variability in the data, then calculates a T² statistic on those new variables. (For details on model-driven multivariate control charts, see section 4.2 of Statistical Process Control Course - JMP User Community.)

Learn how to create a model-driven multivariate control chart in JMP.

https://www.youtube.com/watch?v=Y3ciILGN2Vo

To see more quality and reliability JMP tutorials, visit JMP's Quality and Reliability playlist on YouTube.
To follow along using the sample data included with the software, download a free trial of JMP.

Example of model-driven multivariate control chart

First, let’s perform PCA on the 13 variables. The analysis tells us that instead of 13 independent variables, there are three strong directions in the variable space.

Figure 3: Eigenvalues of the correlation matrix, showing three strong principal components.

Examination of the loading plot tells us that these three dimensions correspond to the seven color variables, the three vertical measurements, and the three horizontal measurements. PC1 describes the color variables vs. the length and width variables; PC2 describes the vertical vs. the horizontal variables; and PC3 describes the vertical and horizontal variables further.

Figure 4: Loading plot of three components, showing groupings of the original variables.

These three components explain over 94% of the variation in the data. We can use a T² chart on the three important components to control the process mean vector.

Figure 5: T² chart on first three principal components. The process is in control.

It can be useful to also plot the residual values from the principal component model. DModX is the scaled squared prediction error and will help monitor changes in both the mean level and correlation of the multivariate process.

Figure 6: DModX chart on the first three principal components. The process is in control.

Let’s introduce a change in the distributions used to simulate the data. There are 50 observations in the in-control data. In the out-of-control data, we’ll introduce a change in the correlation matrix at Observation 51 and a change in the mean vector at Observation 81. The in-control data are used to set the control limits. The T² chart limits change depending on whether the correlation matrix can be considered unknown (Phase I) or known (Phase II).

Figure 7: T² chart on out-of-control data. The correlation shift at Observation 51 is not detected. The mean shift at Observation 81 is detected at Observation 85.

Figure 8: DModX chart on out-of-control data. The correlation shift at Observation 51 is detected immediately. The mean shift at Observation 81 is also detected.

In JMP, you can investigate the signals directly from the chart.

Figure 9: Graphlet for Observation 85 indicates the vertical variables are out of control. This instability in the vertical variables is what is driving the out-of-control signal at Observation 85.

Figure 10: Contribution plot for Observation 85 with individual control charts. Shape 1 Y shows a greater mean shift than Shape 2 Y, and even greater than Shape 3 Y. The other variables control charts are in control for that observation.

You can also examine how much of the T² statistic is due to each component. All the data are shown in a heat map.

Figure 11: T² contribution proportion heat map showing strong contribution from Shape 1 Y in about the last 15 observations, as seen by the darker colors in Shape 1 Y starting in late April 2025.

layout

2 column