The Water Treatment.jmp data set contains daily measurement values of 38 sensors in an urban waste water treatment plant. You are interested in exploring these data for potential outliers. Potential outliers could include sensor failures, storms, and other situations.
1.
Select Help > Sample Data Library and open Water Treatment.jmp.
2.
Select Analyze > Screening > Explore Outliers.
3.
Select the Sensor Measurements column group and click Y, Columns.
4.
5.
Select Multivariate k-Nearest Neighbor Outliers.
7.
Figure 2.9 Outliers in Multivariate k-Nearest Neighbor Outliers Example
Outliers in Multivariate k-Nearest Neighbor Outliers Example
Notice the three extreme outliers selected in the K Nearest Neighbors plots in Figure 2.9. Each of these three rows corresponds to a date when the secondary settler in the water treatment plant was reported as malfunctioning. Because these three data points are due to faulty equipment, exclude them from future analyses.
9.
Click Rerun.
11.
Figure 2.10 Outliers in Multivariate k-Nearest Neighbors Example
Outliers in Multivariate k-Nearest Neighbors Example
Now locate the two light-green outliers close to row 400. Notice how they tend to stay close to each other as k increases. These two rows correspond to dates when solids overloads were experienced by the water treatment plant. Even though these data points have a relatively high Distance to Neighbor K=13, because they are due to a situation that you want to include in your study, you do not exclude them. Instead, you keep them in mind as you conduct further analyses.

Help created on 7/12/2018