Explore Outliers Utility

Exploring and understanding outliers in your data is an important part of analysis. Outliers in data can be due to mistakes in data collection or reporting, measurement systems failure, or the inclusion of error or missing value codes in the data set. The presence of outliers can distort estimates. Therefore, any analyses that are conducted are biased toward those outliers. Outliers also inflate the sample variance. Sometimes retaining outliers in data is necessary, however, and removing them could underestimate the sample variance and bias the data in the opposite direction.

Whether you remove or retain outliers, you must locate them. There are many ways to visually inspect for outliers. For example, box plots, histograms, and scatter plots can sometimes easily display these extreme values. See Visualize Your Data in Discovering JMP.

The Explore Outliers tool provides four different options to identify, explore, and manage outliers in your univariate or multivariate data.

Quantile Range Outliers

Uses the quantile distribution of each column to identify outliers as extreme values. This tool is useful for discovering missing value or error codes within the data. This is the recommended method to begin exploring outliers in your data. See Quantile Range Outliers.

Robust Fit Outliers

Finds robust estimates of the center and spread of each column and identifies outliers as those far from those values. See Robust Fit Outliers.

Multivariate Robust Outliers

Uses the Multivariate platform with Robust option to find outliers based on the Mahalanobis distance from the estimated robust center. See Multivariate Robust Outliers.

Multivariate k-Nearest Neighbor Outliers

Finds outliers as values far from their k-nearest neighbors. See Multivariate k-Nearest Neighbor Outliers.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).