Publication date: 07/30/2020

Robust estimates of parameters are less sensitive to outliers than non-robust estimates. Robust Fit Outliers provides several types of robust estimates of the center and spread of your data to determine those values that can be considered extreme.

Figure 20.7 Robust Fit Outliers Window

Given a robust estimate of the center and spread, outliers are defined as those values that are K times the robust spread from the robust center. The Robust Fit Outliers window provides several options for calculating the robust estimates and multiplier K as well as provides tools to manage the outliers found.

Huber

Uses Huber M-Estimation to estimate center and spread. This option is the default. See Huber and Ronchetti (2009).

Cauchy

Assumes a Cauchy distribution to calculate estimates for the center and spread. Cauchy estimates have a high breakdown point and are typically more robust than Huber estimates. However, if your data are separated into clusters, the Cauchy distribution tends to consider only the half of the data that makes closer clusters, ignoring the rest.

Quartile

Uses the interquartile range (IQR) to estimate the spread. The estimate for the center is the median. The estimate for spread is the IQR divided by 1.34898. Dividing the IQR by this factor makes the spread correspond to one standard deviation if it was normally distributed data.

K

The multiplier that determines outliers as K times the spread away from the center. Large values of K provide a more conservative set of outliers than small values. The default is 4.

Show only columns with outliers

Limits the list of columns in the report to those that contain outliers.

Once the report is displayed using your specifications, there are many ways to explore these extreme values. You can select the outliers in a row by selecting the specified row in the Robust Estimates and Outliers report.

Select Rows

Selects the rows containing outliers for the selected columns in the data table.

Exclude Rows

Sets the Exclude Row state for outliers in the selected columns in the data table. Click Rescan to update the Robust Estimates and Outliers report.

Color Cells

Colors the cells of the selected outliers in the data table.

Color Rows

Colors the rows containing outliers for the selected columns in the data table.

Add to Missing Value Codes

Adds the selected outliers to the missing value codes column property for the selected columns. Use this option to identify known missing value or error codes within the data. Click Rescan to update the Robust Estimates and Outliers report.

Note: Add to Missing Value Codes is not available with Robust Fit Outliers if a By variable is specified in the launch window.

Change to Missing

Changes the outlier value to a missing value in the data table. Click Rescan to update the Robust Estimates and Outliers report.

Rescan

Rescans the data after outlier actions have been taken.

Note: Hold down the Ctrl key and click Rescan to rescan across all command groups.

Close

Closes the Robust Fit Outliers panel.

Note: Hold down the Ctrl key and click Close to close all command windows.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).