To launch Explore Outliers, select Analyze > Screening > Explore Outliers. The launch window appears.
Explore Outliers Utility Launch Window
Explore Outliers Utility Launch Window
In the launch window, select the analysis columns as Y, Columns. You can also specify a By variable. After you click OK, the Explore Outliers report appears. You are presented with the following four outlier analysis commands:
Quantile Range Outliers Window
Quantile Range Outliers Window
An outlier is considered any value more than Q times the interquantile range from the lower and upper quantiles. You can adjust the value of Q and the size of the interquantile range.
The multiplier that helps determine values as outliers. Outliers are considered Q times the interquantile range past the Tail Quantile and Equation shown here values. Large values of Q provide a more conservative set of outliers than small values. The default is 3.
Turns on the exclude row state for the selected rows. Click Rescan to update the Quantile Range Outliers report.
Note: The first time you use choose an action (such as Change to Missing or Exclude Rows) to change your data, the alert window warns you to use the Save As command to save your data table as a new file to preserve a copy of your original data. When this window appears, click OK. If you decide to save your new data file, select File > Save As and save the file with a new name.
Robust Fit Outliers Window
Robust Fit Outliers Window
Given a robust estimate of the center and spread, outliers are defined as those values that are K times the robust spread from the robust center. The Robust Fit Outliers window provides several options for calculating the robust estimates and multiplier K as well as provides tools to manage the outliers found.
The multiplier that determines outliers as K times the spread away from the center. Large values of K provide a more conservative set of outliers than small values. The default is 4.
Changes the outlier value to a missing value in the data table. Click Rescan to update the Robust Estimates and Outliers report.
You can save the distances to the data table by selecting the Save option from the Mahalanobis Distances red triangle menu.
Multivariate Robust Outliers Mahalanobis Distance Plot
Multivariate Robust Outliers Mahalanobis Distance Plot
Multivariate Robust Outliers Mahalanobis Distance Plot shows the Mahalanobis distances of 16 different columns. The plot contains an upper control limit (UCL) of 4.82.This UCL is meant to be a helpful guide to show where potential outliers might be. However, you should use your own discretion to determine which values are outliers. For more details about this upper control limit (UCL), see Mason and Young (2002).
The basic approach of outlier detection is to consider points distant from other points as outliers. One way of determining the distance of a point to other clusters of points is explore the distance to its nearest neighbors. For each value of K, the Multivariate k-Nearest Neighbor Outliers utility displays a plot of the Euclidean distance from each point to it’s Kth nearest neighbor. You specify the largest value of K, denoted as k. Plots are provided for Equation shown here, skipping values by the Fibonacci sequence to avoid displaying too many plots.
This approach is sensitive to the specified value of k. A small value of k can miss identifying points as outliers and a large value of k can falsely classify points as outliers:
Suppose that the specified k is small, so that you are only studying a few neighbors. If there is a cluster of more than k points that is far from the rest of the points, then the points within the cluster will have small distances to their nearest neighbors. You may be unable to detect the cluster of outliers.
Suppose that the specified k is large, so that you are studying a large number of neighbors. If there are clusters with fewer than k data points, then the points within these clusters may appear to be outliers. You may overlook the fact that the points form a cluster, interpreting the individual cluster members as outliers instead.
When you select Multivariate k-Nearest Neighbor Outliers from the list of commands, you are asked to specify the value of k to use as an upper bound for the furthest neighbor to be considered. Notice that the default value is set to 8.
The report shows plots for select values of K up to the value k. The value of K for each plot is displayed in its vertical axis label, which is of the form Distance to Neighbor K = <a>, where a is an integer denoting the ath closest neighbor. Each plot shows the distance from the point in the ith row to its ath nearest neighbor. The points that have large distances from their neighbors, across multiple values of K, are likely to be outliers.

Help created on 9/19/2017